API Reference¶
DataFrame¶
-
class
cudf.core.dataframe.DataFrame(data=None, index=None, columns=None, dtype=None)¶ A GPU Dataframe object.
- Parameters
- dataarray-like, Iterable, dict, or DataFrame.
Dict can contain Series, arrays, constants, or list-like objects.
- indexIndex or array-like
Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
- columnsIndex or array-like
Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- dtypedtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
Examples
Build dataframe with
__setitem__:>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2, 3, 4] >>> df['val'] = [float(i + 10) for i in range(5)] # insert column >>> print(df) key val 0 0 10.0 1 1 11.0 2 2 12.0 3 3 13.0 4 4 14.0
Build DataFrame via dict of columns:
>>> import cudf >>> import numpy as np >>> from datetime import datetime, timedelta
>>> t0 = datetime.strptime('2018-10-07 12:00:00', '%Y-%m-%d %H:%M:%S') >>> n = 5 >>> df = cudf.DataFrame({ ... 'id': np.arange(n), ... 'datetimes': np.array( ... [(t0+ timedelta(seconds=x)) for x in range(n)]) ... }) >>> df id datetimes 0 0 2018-10-07T12:00:00.000 1 1 2018-10-07T12:00:01.000 2 2 2018-10-07T12:00:02.000 3 3 2018-10-07T12:00:03.000 4 4 2018-10-07T12:00:04.000
Build DataFrame via list of rows as tuples:
>>> import cudf >>> df = cudf.DataFrame([ ... (5, "cats", "jump", np.nan), ... (2, "dogs", "dig", 7.5), ... (3, "cows", "moo", -2.1, "occasionally"), ... ]) >>> df 0 1 2 3 4 0 5 cats jump null None 1 2 dogs dig 7.5 None 2 3 cows moo -2.1 occasionally
Convert from a Pandas DataFrame:
>>> import pandas as pd >>> import cudf >>> pdf = pd.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, None, 0.3]}) >>> df = cudf.from_pandas(pdf) >>> df a b 0 0 0.1 1 1 0.2 2 2 nan 3 3 0.3
- Attributes
TTranspose index and columns.
atAlias for
DataFrame.loc; provided for compatibility with Pandas.columnsReturns a tuple of columns
dtypesReturn the dtypes in this object.
emptyIndicator whether DataFrame or Series is empty.
iatAlias for
DataFrame.iloc; provided for compatibility with Pandas.ilocSelecting rows and column by position.
indexReturns the index of the DataFrame
locSelecting rows and columns by label or boolean mask.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the DataFrame.
sizeReturn the number of elements in the underlying data.
valuesReturn a CuPy representation of the DataFrame.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
add(other[, axis, level, fill_value])Get Addition of dataframe and other, element-wise (binary operator add).
add_column(name, data[, forceindex])Add a column
all([axis, bool_only, skipna, level])Return whether all elements are True in DataFrame.
any([axis, bool_only, skipna, level])Return whether any elements is True in DataFrame.
append(other[, ignore_index, …])Append rows of other to the end of caller, returning a new object.
apply_chunks(func, incols, outcols[, …])Transform user-specified chunks using the user-provided function.
apply_rows(func, incols, outcols, kwargs[, …])Apply a row-wise user defined function.
argsort([ascending, na_position])Sort by the values.
as_gpu_matrix([columns, order])Convert to a matrix in device memory.
as_matrix([columns])Convert to a matrix in host memory.
asin()Get Trigonometric inverse sine, element-wise.
assign(**kwargs)Assign columns to DataFrame from keyword arguments.
astype(dtype[, copy, errors])Cast the DataFrame to the given dtype
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Returns a copy of this dataframe
corr()Compute the correlation matrix of a DataFrame.
cos()Get Trigonometric cosine, element-wise.
count([axis, level, numeric_only])Count
non-NAcells for each column or row.cov(**kwargs)Compute the covariance matrix of a DataFrame.
cummax([axis, skipna])Return cumulative maximum of the DataFrame.
cummin([axis, skipna])Return cumulative minimum of the DataFrame.
cumprod([axis, skipna])Return cumulative product of the DataFrame.
cumsum([axis, skipna])Return cumulative sum of the DataFrame.
describe([percentiles, include, exclude])Compute summary statistics of a DataFrame’s columns.
div(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator truediv).
drop([labels, axis, columns, errors, inplace])Drop column(s)
drop_column(name)Drop a column by name
drop_duplicates([subset, keep, inplace, …])Return DataFrame with duplicate rows removed, optionally only considering certain subset of columns.
dropna([axis, how, thresh, subset, inplace])Drops rows (or columns) containing nulls from a Column.
equals(other)Test whether two objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, method, axis, inplace, limit])Fill null values with
value.floordiv(other[, axis, level, fill_value])Get Integer division of dataframe and other, element-wise (binary operator floordiv).
from_arrow(table)Convert from a PyArrow Table.
from_gpu_matrix(data[, index, columns, …])Convert from a numba gpu ndarray.
from_pandas(dataframe[, nan_as_null])Convert from a Pandas DataFrame.
from_records(data[, index, columns, nan_as_null])Convert structured or record ndarray to DataFrame.
groupby([by, axis, level, as_index, sort, …])Group DataFrame using a mapper or by a Series of columns.
hash_columns([columns])Hash the given columns and return a new device array
head([n])Returns the first n rows as a new DataFrame
info([verbose, buf, max_cols, memory_usage, …])Print a concise summary of a DataFrame.
insert(loc, name, value)Add a column to DataFrame at the index specified by loc.
Interleave Series columns of a table into a single column.
isin(values)Whether each element in the DataFrame is contained in values.
isna()Identify missing values.
isnull()Identify missing values.
Iterate over column names and series pairs
join(other[, on, how, lsuffix, rsuffix, …])Join columns with other DataFrame on index or on a key column.
keys()Get the columns.
kurt([axis, skipna, level, numeric_only])Return Fisher’s unbiased kurtosis of a sample.
kurtosis([axis, skipna, level, numeric_only])Return Fisher’s unbiased kurtosis of a sample.
label_encoding(column, prefix, cats[, …])Encode labels in a column with label encoding.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max([axis, skipna, dtype, level, numeric_only])Return the maximum of the values in the DataFrame.
mean([axis, skipna, level, numeric_only])Return the mean of the values for the requested axis.
melt(**kwargs)Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
memory_usage([index, deep])Return the memory usage of each column in bytes.
merge(right[, on, left_on, right_on, …])Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.
min([axis, skipna, dtype, level, numeric_only])Return the minimum of the values in the DataFrame.
mod(other[, axis, level, fill_value])Get Modulo division of dataframe and other, element-wise (binary operator mod).
mul(other[, axis, level, fill_value])Get Multiplication of dataframe and other, element-wise (binary operator mul).
Convert nans (if any) to nulls.
nlargest(n, columns[, keep])Get the rows of the DataFrame sorted by the n largest value of columns
notna()Identify non-missing values.
notnull()Identify non-missing values.
nsmallest(n, columns[, keep])Get the rows of the DataFrame sorted by the n smallest value of columns
one_hot_encoding(column, prefix, cats[, …])Expand a column with one-hot-encoding.
partition_by_hash(columns, nparts[, keep_index])Partition the dataframe by the hashed value of data in columns.
pop(item)Return a column and drop it from the DataFrame.
pow(other[, axis, level, fill_value])Get Exponential power of dataframe and other, element-wise (binary operator pow).
prod([axis, skipna, dtype, level, …])Return product of the values in the DataFrame.
product([axis, skipna, dtype, level, …])Return product of the values in the DataFrame.
quantile([q, axis, numeric_only, …])Return values at the given quantile.
quantiles([q, interpolation])Return values at the given quantile.
query(expr[, local_dict])Query with a boolean expression using Numba to compile a GPU kernel.
radd(other[, axis, level, fill_value])Get Addition of dataframe and other, element-wise (binary operator radd).
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rdiv(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
reindex([labels, axis, index, columns, copy])Return a new DataFrame whose axes conform to a new index
rename([mapper, index, columns, axis, copy, …])Alter column and index labels.
repeat(repeats[, axis])Repeats elements consecutively.
replace([to_replace, value, inplace, limit, …])Replace values given in to_replace with replacement.
reset_index([level, drop, inplace, …])Reset the index.
rfloordiv(other[, axis, level, fill_value])Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
rmod(other[, axis, level, fill_value])Get Modulo division of dataframe and other, element-wise (binary operator rmod).
rmul(other[, axis, level, fill_value])Get Multiplication of dataframe and other, element-wise (binary operator rmul).
rolling(window[, min_periods, center, axis, …])Rolling window calculations.
rpow(other[, axis, level, fill_value])Get Exponential power of dataframe and other, element-wise (binary operator pow).
rsub(other[, axis, level, fill_value])Get Subtraction of dataframe and other, element-wise (binary operator rsub).
rtruediv(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
select_dtypes([include, exclude])Return a subset of the DataFrame’s columns based on the column dtypes.
set_index(index[, drop])Return a new DataFrame with a new index
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
skew([axis, skipna, level, numeric_only])Return unbiased Fisher-Pearson skew of a sample.
sort_index([axis, level, ascending, …])Sort object by labels (along an axis).
sort_values(by[, axis, ascending, inplace, …])Sort by the values row-wise.
sqrt()Get the non-negative square-root of all elements, element-wise.
stack([level, dropna])Stack the prescribed level(s) from columns to index
std([axis, skipna, level, ddof, numeric_only])Return sample standard deviation of the DataFrame.
sub(other[, axis, level, fill_value])Get Subtraction of dataframe and other, element-wise (binary operator sub).
sum([axis, skipna, dtype, level, …])Return sum of the values in the DataFrame.
tail([n])Returns the last n rows as a new DataFrame
take(positions[, keep_index])Return a new DataFrame containing the rows specified by positions
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_arrow([preserve_index])Convert to a PyArrow Table.
to_csv([path, sep, na_rep, columns, header, …])Write a dataframe to csv file format.
Converts a cuDF object into a DLPack tensor.
to_feather(path, *args, **kwargs)Write a DataFrame to the feather format.
Convert to a numba gpu ndarray
to_hdf(path_or_buf, key, *args, **kwargs)Write the contained data to an HDF5 file using HDFStore.
to_json([path_or_buf])Convert the cuDF object to a JSON string.
to_orc(fname[, compression])Write a DataFrame to the ORC format.
to_pandas(**kwargs)Convert to a Pandas DataFrame.
to_parquet(path, *args, **kwargs)Write a DataFrame to the parquet format.
to_records([index])Convert to a numpy recarray
Convert to string
Transpose index and columns.
truediv(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator truediv).
var([axis, skipna, level, ddof, numeric_only])Return unbiased variance of the DataFrame.
where(cond[, other, inplace])Replace values where the condition is False.
-
property
T¶ Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().
- Returns
- outDataFrame
The transposed DataFrame.
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
add(other, axis='columns', level=None, fill_value=None)¶ Get Addition of dataframe and other, element-wise (binary operator add).
Equivalent to
dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361 >>> df.add(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
-
add_column(name, data, forceindex=False)¶ Add a column
- Parameters
- namestr
Name of column to be added.
- dataSeries, array-like
Values to be added.
-
all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True in DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- Returns
- Series
Notes
Parameters currently not supported are axis, bool_only, level.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]}) >>> df.all() a True b False dtype: bool
-
any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any elements is True in DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- Returns
- Series
Notes
Parameters currently not supported are axis, bool_only, level.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]}) >>> df.any() a True b True dtype: bool
-
append(other, ignore_index=False, verify_integrity=False, sort=False)¶ Append rows of other to the end of caller, returning a new object. Columns in other that are not in the caller are added as new columns.
- Parameters
- otherDataFrame or Series/dict-like object, or list of these
The data to append.
- ignore_indexbool, default False
If True, do not use the index labels.
- sortbool, default False
Sort columns ordering if the columns of self and other are not aligned.
- verify_integritybool, default False
This Parameter is currently not supported.
- Returns
- DataFrame
See also
cudf.concatGeneral function to concatenate DataFrame or objects.
Notes
If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged. Iteratively appending rows to a cudf DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once. verify_integrity parameter is not supported yet.
Examples
>>> import cudf >>> df = cudf.DataFrame([[1, 2], [3, 4]], columns=list('AB')) >>> df A B 0 1 2 1 3 4 >>> df2 = cudf.DataFrame([[5, 6], [7, 8]], columns=list('AB')) >>> df2 A B 0 5 6 1 7 8 >>> df.append(df2) A B 0 1 2 1 3 4 0 5 6 1 7 8
With ignore_index set to True:
>>> df.append(df2, ignore_index=True) A B 0 1 2 1 3 4 2 5 6 3 7 8
The following, while not recommended methods for generating DataFrames, show two ways to generate a DataFrame from multiple data sources. Less efficient:
>>> df = cudf.DataFrame(columns=['A']) >>> for i in range(5): ... df = df.append({'A': i}, ignore_index=True) >>> df A 0 0 1 1 2 2 3 3 4 4
More efficient than above:
>>> cudf.concat([cudf.DataFrame([i], columns=['A']) for i in range(5)], ... ignore_index=True) A 0 0 1 1 2 2 3 3 4 4
-
apply_chunks(func, incols, outcols, kwargs={}, pessimistic_nulls=True, chunks=None, blkct=None, tpb=None)¶ Transform user-specified chunks using the user-provided function.
- Parameters
- dfDataFrame
The source dataframe.
- funcfunction
The transformation function that will be executed on the CUDA GPU.
- incols: list or dict
A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.
- outcols: dict
A dictionary of output column names and their dtype.
- kwargs: dict
name-value of extra arguments. These values are passed directly into the function.
- pessimistic_nullsbool
Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.
- chunksint or Series-like
If it is an
int, it is the chunksize. If it is an array, it contains integer offset for the start of each chunk. The span of a chunk for chunk i-th isdata[chunks[i] : chunks[i + 1]]for anyi + 1 < chunks.size; or,data[chunks[i]:]for thei == len(chunks) - 1.- tpbint; optional
The threads-per-block for the underlying kernel. If not specified (Default), uses Numba
.forall(...)built-in to query the CUDA Driver API to determine optimal kernel launch configuration. Specify 1 to emulate serial execution for each chunk. It is a good starting point but inefficient. Its maximum possible value is limited by the available CUDA GPU resources.- blkctint; optional
The number of blocks for the underlying kernel. If not specified (Default) and
tpbis not specified (Default), uses Numba.forall(...)built-in to query the CUDA Driver API to determine optimal kernel launch configuration. If not specified (Default) andtpbis specified, useschunksas the number of blocks.
See also
Examples
For
tpb > 1,funcis executed bytpbnumber of threads concurrently. To access the thread id and count, usenumba.cuda.threadIdx.xandnumba.cuda.blockDim.x, respectively (See numba CUDA kernel documentation).In the example below, the kernel is invoked concurrently on each specified chunk. The kernel computes the corresponding output for the chunk.
By looping over the range
range(cuda.threadIdx.x, in1.size, cuda.blockDim.x), the kernel function can be used with any tpb in an efficient manner.>>> from numba import cuda >>> @cuda.jit ... def kernel(in1, in2, in3, out1): ... for i in range(cuda.threadIdx.x, in1.size, cuda.blockDim.x): ... x = in1[i] ... y = in2[i] ... z = in3[i] ... out1[i] = x * y + z
-
apply_rows(func, incols, outcols, kwargs, pessimistic_nulls=True, cache_key=None)¶ Apply a row-wise user defined function.
- Parameters
- dfDataFrame
The source dataframe.
- funcfunction
The transformation function that will be executed on the CUDA GPU.
- incols: list or dict
A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.
- outcols: dict
A dictionary of output column names and their dtype.
- kwargs: dict
name-value of extra arguments. These values are passed directly into the function.
- pessimistic_nullsbool
Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.
Examples
The user function should loop over the columns and set the output for each row. Loop execution order is arbitrary, so each iteration of the loop MUST be independent of each other.
When
funcis invoked, the array args corresponding to the input/output are strided so as to improve GPU parallelism. The loop in the function resembles serial code, but executes concurrently in multiple threads.>>> import cudf >>> import numpy as np >>> df = cudf.DataFrame() >>> nelem = 3 >>> df['in1'] = np.arange(nelem) >>> df['in2'] = np.arange(nelem) >>> df['in3'] = np.arange(nelem)
Define input columns for the kernel
>>> in1 = df['in1'] >>> in2 = df['in2'] >>> in3 = df['in3'] >>> def kernel(in1, in2, in3, out1, out2, kwarg1, kwarg2): ... for i, (x, y, z) in enumerate(zip(in1, in2, in3)): ... out1[i] = kwarg2 * x - kwarg1 * y ... out2[i] = y - kwarg1 * z
Call
.apply_rowswith the name of the input columns, the name and dtype of the output columns, and, optionally, a dict of extra arguments.>>> df.apply_rows(kernel, ... incols=['in1', 'in2', 'in3'], ... outcols=dict(out1=np.float64, out2=np.float64), ... kwargs=dict(kwarg1=3, kwarg2=4)) in1 in2 in3 out1 out2 0 0 0 0 0.0 0.0 1 1 1 1 1.0 -2.0 2 2 2 2 2.0 -4.0
-
argsort(ascending=True, na_position='last')¶ Sort by the values.
- Parameters
- ascendingbool or list of bool, default True
If True, sort values in ascending order, otherwise descending.
- na_position{‘first’ or ‘last’}, default ‘last’
Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
- Returns
- out_column_indscuDF Column of indices sorted based on input
Notes
Difference from pandas:
Support axis=’index’ only.
Not supporting: inplace, kind
Ascending can be a list of bools to control per column
-
as_gpu_matrix(columns=None, order='F')¶ Convert to a matrix in device memory.
- Parameters
- columnssequence of str
List of a column names to be extracted. The order is preserved. If None is specified, all columns are used.
- order‘F’ or ‘C’
Optional argument to determine whether to return a column major (Fortran) matrix or a row major (C) matrix.
- Returns
- A (nrow x ncol) numba device ndarray
-
as_matrix(columns=None)¶ Convert to a matrix in host memory.
- Parameters
- columnssequence of str
List of a column names to be extracted. The order is preserved. If None is specified, all columns are used.
- Returns
- A (nrow x ncol) numpy ndarray in “F” order.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
assign(**kwargs)¶ Assign columns to DataFrame from keyword arguments.
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df = df.assign(a=[0, 1, 2], b=[3, 4, 5]) >>> print(df) a b 0 0 3 1 1 4 2 2 5
-
astype(dtype, copy=False, errors='raise', **kwargs)¶ Cast the DataFrame to the given dtype
- Parameters
- dtypedata type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire DataFrame object to the same type. Alternatively, use
{col: dtype, ...}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.- copybool, default False
Return a deep-copy when
copy=True. Note by defaultcopy=Falsesetting is used and hence changes to values then may propagate to other cudf objects.- errors{‘raise’, ‘ignore’, ‘warn’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype.
raise: allow exceptions to be raisedignore: suppress exceptions. On error return original object.warn: prints last exceptions as warnings and return original object.
- **kwargsextra arguments to pass on to the constructor
- Returns
- castedDataFrame
-
property
at¶ Alias for
DataFrame.loc; provided for compatibility with Pandas.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
property
columns¶ Returns a tuple of columns
-
copy(deep=True)¶ Returns a copy of this dataframe
- Parameters
- deep: bool
Make a full copy of Series columns and Index at the GPU level, or create a new allocation with references.
-
corr()¶ Compute the correlation matrix of a DataFrame.
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
count(axis=0, level=None, numeric_only=False, **kwargs)¶ Count
non-NAcells for each column or row.The values
None,NaN,NaTare consideredNA.- Returns
- Series
For each column/row the number of non-NA/null entries.
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> import numpy as np >>> df = cudf.DataFrame({"Person": ... ["John", "Myla", "Lewis", "John", "Myla"], ... "Age": [24., np.nan, 21., 33, 26], ... "Single": [False, True, True, True, False]}) >>> df.count() Person 5 Age 4 Single 5 dtype: int64
-
cov(**kwargs)¶ Compute the covariance matrix of a DataFrame.
- Parameters
- **kwargs
Keyword arguments to be passed to cupy.cov
- Returns
- covDataFrame
-
cummax(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative maximum of the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.cummax() a b 0 1 7 1 2 8 2 3 9 3 4 10
-
cummin(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative minimum of the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.cummin() a b 0 1 7 1 1 7 2 1 7 3 1 7
-
cumprod(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative product of the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> s.cumprod() a b 0 1 7 1 2 56 2 6 504 3 24 5040
-
cumsum(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative sum of the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> s.cumsum() a b 0 1 7 1 3 15 2 6 24 3 10 34
-
describe(percentiles=None, include=None, exclude=None)¶ Compute summary statistics of a DataFrame’s columns. For numeric data, the output includes the minimum, maximum, mean, median, standard deviation, and various quantiles. For object data, the output includes the count, number of unique values, the most common value, and the number of occurrences of the most common value.
- Parameters
- percentileslist-like, optional
The percentiles used to generate the output summary statistics. If None, the default percentiles used are the 25th, 50th and 75th. Values should be within the interval [0, 1].
- include: str, list-like, optional
The dtypes to be included in the output summary statistics. Columns of dtypes not included in this list will not be part of the output. If include=’all’, all dtypes are included. Default of None includes all numeric columns.
- exclude: str, list-like, optional
The dtypes to be excluded from the output summary statistics. Columns of dtypes included in this list will not be part of the output. Default of None excludes no columns.
- Returns
- output_frameDataFrame
Summary statistics of relevant columns in the original dataframe.
Examples
Describing a
Seriescontaining numeric values.>>> import cudf >>> s = cudf.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> print(s.describe()) stats values 0 count 10.0 1 mean 5.5 2 std 3.02765 3 min 1.0 4 25% 2.5 5 50% 5.5 6 75% 7.5 7 max 10.0
Describing a
DataFrame. By default all numeric fields are returned.>>> gdf = cudf.DataFrame() >>> gdf['a'] = [1,2,3] >>> gdf['b'] = [1.0, 2.0, 3.0] >>> gdf['c'] = ['x', 'y', 'z'] >>> gdf['d'] = [1.0, 2.0, 3.0] >>> gdf['d'] = gdf['d'].astype('float32') >>> print(gdf.describe()) stats a b d 0 count 3.0 3.0 3.0 1 mean 2.0 2.0 2.0 2 std 1.0 1.0 1.0 3 min 1.0 1.0 1.0 4 25% 1.5 1.5 1.5 5 50% 1.5 1.5 1.5 6 75% 2.5 2.5 2.5 7 max 3.0 3.0 3.0
Using the
includekeyword to describe only specific dtypes.>>> gdf = cudf.DataFrame() >>> gdf['a'] = [1,2,3] >>> gdf['b'] = [1.0, 2.0, 3.0] >>> gdf['c'] = ['x', 'y', 'z'] >>> print(gdf.describe(include='int')) stats a 0 count 3.0 1 mean 2.0 2 std 1.0 3 min 1.0 4 25% 1.5 5 50% 1.5 6 75% 2.5 7 max 3.0
-
div(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to
dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df.truediv(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0 >>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0 >>> df / 10 angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
-
drop(labels=None, axis=None, columns=None, errors='raise', inplace=False)¶ Drop column(s)
- Parameters
- labelsstr or sequence of strings
Name of column(s) to be dropped.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Only axis=1 is currently supported.
- columns
array of column names, the same as using labels and axis=1
- errors{‘ignore’, ‘raise’}, default ‘raise’
This parameter is currently ignored.
- inplacebool, default False
If True, do operation inplace and return self.
- Returns
- A dataframe without dropped column(s)
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2, 3, 4] >>> df['val'] = [float(i + 10) for i in range(5)] >>> df_new = df.drop('val') >>> print(df) key val 0 0 10.0 1 1 11.0 2 2 12.0 3 3 13.0 4 4 14.0 >>> print(df_new) key 0 0 1 1 2 2 3 3 4 4
-
drop_column(name)¶ Drop a column by name
-
drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain subset of columns.
-
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Drops rows (or columns) containing nulls from a Column.
- Parameters
- axis{0, 1}, optional
Whether to drop rows (axis=0, default) or columns (axis=1) containing nulls.
- how{“any”, “all”}, optional
Specifies how to decide whether to drop a row (or column). any (default) drops rows (or columns) containing at least one null value. all drops only rows (or columns) containing all null values.
- thresh: int, optional
If specified, then drops every row (or column) containing less than thresh non-null values
- subsetlist, optional
List of columns to consider when dropping rows (all columns are considered by default). Alternatively, when dropping columns, subset is a list of rows to consider.
- inplacebool, default False
If True, do operation inplace and return None.
- Returns
- Copy of the DataFrame with rows/columns containing nulls dropped.
See also
cudf.core.dataframe.DataFrame.isnaIndicate null values.
cudf.core.dataframe.DataFrame.notnaIndicate non-null values.
cudf.core.dataframe.DataFrame.fillnaReplace null values.
cudf.core.series.Series.dropnaDrop null values.
cudf.core.index.Index.dropnaDrop null indices.
Examples
>>> import cudf >>> df = cudf.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], ... "toy": ['Batmobile', None, 'Bullwhip'], ... "born": [np.datetime64("1940-04-25"), ... np.datetime64("NaT"), ... np.datetime64("NaT")]}) >>> df name toy born 0 Alfred Batmobile 1940-04-25 1 Batman None null 2 Catwoman Bullwhip null
Drop the rows where at least one element is null.
>>> df.dropna() name toy born 0 Alfred Batmobile 1940-04-25
Drop the columns where at least one element is null.
>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman
Drop the rows where all elements are null.
>>> df.dropna(how='all') name toy born 0 Alfred Batmobile 1940-04-25 1 Batman None null 2 Catwoman Bullwhip null
Keep only the rows with at least 2 non-null values.
>>> df.dropna(thresh=2) name toy born 0 Alfred Batmobile 1940-04-25 2 Catwoman Bullwhip null
Define in which columns to look for null values.
>>> df.dropna(subset=['name', 'born']) name toy born 0 Alfred Batmobile 1940-04-25
Keep the DataFrame with valid entries in the same variable.
>>> df.dropna(inplace=True) >>> df name toy born 0 Alfred Batmobile 1940-04-25
-
property
dtypes¶ Return the dtypes in this object.
-
property
empty¶ Indicator whether DataFrame or Series is empty.
True if DataFrame/Series is entirely empty (no items), meaning any of the axes are of length 0.
- Returns
- outbool
If DataFrame/Series is empty, return True, if not return False.
Notes
If DataFrame/Series contains only null values, it is still not considered empty. See the example below.
Examples
>>> import cudf >>> df = cudf.DataFrame({'A' : []}) >>> df Empty DataFrame Columns: [A] Index: [] >>> df.empty True
If we only have null values in our DataFrame, it is not considered empty! We will need to drop the null’s to make the DataFrame empty:
>>> df = cudf.DataFrame({'A' : [None, None]}) >>> df A 0 null 1 null >>> df.empty False >>> df.dropna().empty True
Non-empty and empty Series example:
>>> s = cudf.Series([1, 2, None]) >>> s 0 1 1 2 2 null dtype: int64 >>> s.empty False >>> s = cudf.Series([]) >>> s Series([], dtype: float64) >>> s.empty True
-
equals(other)¶ Test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.
- Parameters
- otherSeries or DataFrame
The other Series or DataFrame to be compared with the first.
- Returns
- bool
True if all elements are the same in both objects, False otherwise.
Examples
>>> import cudf >>> df = cudf.DataFrame({1: [10], 2: [20]}) >>> df 1 2 0 10 20 >>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]}) >>> exactly_equal 1 2 0 10 20 >>> df.equals(exactly_equal) True >>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]}) >>> different_column_type 1.0 2.0 0 10 20 >>> df.equals(different_column_type) True
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, method=None, axis=None, inplace=False, limit=None)¶ Fill null values with
value.- Parameters
- valuescalar, Series-like or dict
Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns.
- Returns
- resultDataFrame
Copy with nulls filled.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]}) >>> df a b 0 1 3 1 2 null 2 null 5 >>> df.fillna(4) a b 0 1 3 1 2 4 2 4 5 >>> df.fillna({'a': 3, 'b': 4}) a b 0 1 3 1 2 4 2 3 5
fillnaon a Series object:>>> ser = cudf.Series(['a', 'b', None, 'c']) >>> ser 0 a 1 b 2 None 3 c dtype: object >>> ser.fillna('z') 0 a 1 b 2 z 3 c dtype: object
fillnacan also supports inplace operation:>>> ser.fillna('z', inplace=True) >>> ser 0 a 1 b 2 z 3 c dtype: object >>> df.fillna({'a': 3, 'b': 4}, inplace=True) >>> df a b 0 1 3 1 2 4 2 3 5
-
floordiv(other, axis='columns', level=None, fill_value=None)¶ Get Integer division of dataframe and other, element-wise (binary operator floordiv).
Equivalent to
dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [1, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df.floordiv(2) angles degrees circle 0 180 triangle 1 90 rectangle 2 180 >>> df // 2 angles degrees circle 0 180 triangle 1 90 rectangle 2 180
-
classmethod
from_arrow(table)¶ Convert from a PyArrow Table.
- Parameters
- tablePyArrow Table Object
PyArrow Table Object which has to be converted to cudf DataFrame.
- Raises
- TypeError for invalid input type.
Notes
Does not support automatically setting index column(s) similar to how
to_pandasworks for PyArrow Tables.
Examples
>>> import pyarrow as pa >>> import cudf >>> data = [pa.array([1, 2, 3]), pa.array([4, 5, 6])] >>> batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1']) >>> table = pa.Table.from_batches([batch]) >>> cudf.DataFrame.from_arrow(table) f0 f1 0 1 4 1 2 5 2 3 6
-
classmethod
from_gpu_matrix(data, index=None, columns=None, nan_as_null=False)¶ Convert from a numba gpu ndarray.
- Parameters
- datanumba gpu ndarray
- indexstr, Index
The name of the index column in data or an Index itself. If None, the default index is used.
- columnslist of str
List of column names to include.
- Returns
- DataFrame
-
classmethod
from_pandas(dataframe, nan_as_null=None)¶ Convert from a Pandas DataFrame.
- Parameters
- dataframePandas DataFrame object
A Pandads DataFrame object which has to be converted to cuDF DataFrame.
- nan_as_nullbool, Default True
If
True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> data = [[0,1], [1,2], [3,4]] >>> pdf = pd.DataFrame(data, columns=['a', 'b'], dtype=int) >>> cudf.from_pandas(pdf) a b 0 0 1 1 1 2 2 3 4
-
classmethod
from_records(data, index=None, columns=None, nan_as_null=False)¶ Convert structured or record ndarray to DataFrame.
- Parameters
- datanumpy structured dtype or recarray of ndim=2
- indexstr, array-like
The name of the index column in data. If None, the default index is used.
- columnslist of str
List of column names to include.
- Returns
- DataFrame
-
groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, dropna=True, method=None)¶ Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a cupy array is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
- dropnabool, optional
If True (default), do not include the “null” group.
- Returns
- DataFrameGroupBy
Returns a groupby object that contains information about the groups.
Examples
>>> import cudf >>> import pandas as pd >>> df = cudf.DataFrame({'Animal': ['Falcon', 'Falcon', ... 'Parrot', 'Parrot'], ... 'Max Speed': [380., 370., 24., 26.]}) >>> df Animal Max Speed 0 Falcon 380.0 1 Falcon 370.0 2 Parrot 24.0 3 Parrot 26.0 >>> df.groupby(['Animal']).mean() Max Speed Animal Falcon 375.0 Parrot 25.0
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... ['Captive', 'Wild', 'Captive', 'Wild']] >>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) >>> df = cudf.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index) >>> df Max Speed Animal Type Falcon Captive 390.0 Wild 350.0 Parrot Captive 30.0 Wild 20.0 >>> df.groupby(level=0).mean() Max Speed Animal Falcon 370.0 Parrot 25.0 >>> df.groupby(level="Type").mean() Max Speed Type Captive 210.0 Wild 185.0
-
hash_columns(columns=None)¶ Hash the given columns and return a new device array
- Parameters
- columnssequence of str; optional
Sequence of column names. If columns is None (unspecified), all columns in the frame are used.
-
head(n=5)¶ Returns the first n rows as a new DataFrame
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2, 3, 4] >>> df['val'] = [float(i + 10) for i in range(5)] # insert column >>> print(df.head(2)) key val 0 0 10.0 1 1 11.0
-
property
iat¶ Alias for
DataFrame.iloc; provided for compatibility with Pandas.
-
property
iloc¶ Selecting rows and column by position.
See also
Notes
One notable difference from Pandas is when DataFrame is of mixed types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series yet.
Mixed dtype single row output as a dataframe (pandas results in Series)
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3], "b":["a", "b", "c"]}) >>> df.iloc[0] a b 0 1 a
Examples
>>> df = cudf.DataFrame({'a': range(20), ... 'b': range(20), ... 'c': range(20)})
Select a single row using an integer index.
>>> print(df.iloc[1]) a 1 b 1 c 1
Select multiple rows using a list of integers.
>>> print(df.iloc[[0, 2, 9, 18]]) a b c 0 0 0 0 2 2 2 2 9 9 9 9 18 18 18 18
Select rows using a slice.
>>> print(df.iloc[3:10:2]) a b c 3 3 3 3 5 5 5 5 7 7 7 7 9 9 9 9
Select both rows and columns.
>>> print(df.iloc[[1, 3, 5, 7], 2]) 1 1 3 3 5 5 7 7 Name: c, dtype: int64
Setting values in a column using iloc.
>>> df.iloc[:4] = 0 >>> print(df) a b c 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 [10 more rows]
-
property
index¶ Returns the index of the DataFrame
-
info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)¶ Print a concise summary of a DataFrame.
This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.
- Parameters
- verbosebool, optional
Whether to print the full summary. By default, the setting in
pandas.options.display.max_info_columnsis followed.- bufwritable buffer, defaults to sys.stdout
Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.
- max_colsint, optional
When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in
pandas.options.display.max_info_columnsis used.- memory_usagebool, str, optional
Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the
pandas.options.display.memory_usagesetting. True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources.- null_countsbool, optional
Whether to show the non-null counts. By default, this is shown only if the frame is smaller than
pandas.options.display.max_info_rowsandpandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.
- Returns
- None
This method prints a summary of a DataFrame and returns None.
See also
DataFrame.describeGenerate descriptive statistics of DataFrame columns.
DataFrame.memory_usageMemory usage of DataFrame columns.
Examples
>>> import cudf >>> int_values = [1, 2, 3, 4, 5] >>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon'] >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0] >>> df = cudf.DataFrame({"int_col": int_values, ... "text_col": text_values, ... "float_col": float_values}) >>> df int_col text_col float_col 0 1 alpha 0.00 1 2 beta 0.25 2 3 gamma 0.50 3 4 delta 0.75 4 5 epsilon 1.00
Prints information of all columns:
>>> df.info(verbose=True) <class 'cudf.core.dataframe.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 5 non-null int64 1 text_col 5 non-null object 2 float_col 5 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 130.0+ bytes
Prints a summary of columns count and its dtypes but not per column information:
>>> df.info(verbose=False) <class 'cudf.core.dataframe.DataFrame'> RangeIndex: 5 entries, 0 to 4 Columns: 3 entries, int_col to float_col dtypes: float64(1), int64(1), object(1) memory usage: 130.0+ bytes
Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:
>>> import io >>> buffer = io.StringIO() >>> df.info(buf=buffer) >>> s = buffer.getvalue() >>> with open("df_info.txt", "w", ... encoding="utf-8") as f: ... f.write(s) ... 369
The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:
>>> import numpy as np >>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6) >>> df = cudf.DataFrame({ ... 'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6), ... 'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6), ... 'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6) ... }) >>> df.info(memory_usage='deep') <class 'cudf.core.dataframe.DataFrame'> RangeIndex: 1000000 entries, 0 to 999999 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 column_1 1000000 non-null object 1 column_2 1000000 non-null object 2 column_3 1000000 non-null object dtypes: object(3) memory usage: 14.3 MB
-
insert(loc, name, value)¶ Add a column to DataFrame at the index specified by loc.
- Parameters
- locint
location to insert by index, cannot be greater then num columns + 1
- namenumber or string
name or label of column to be inserted
- valueSeries or array-like
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
isin(values)¶ Whether each element in the DataFrame is contained in values.
- Parameters
- valuesiterable, Series, DataFrame or dict
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
- Returns
- DataFrame:
DataFrame of booleans showing whether each element in the DataFrame is contained in values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
iteritems()¶ Iterate over column names and series pairs
-
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, type='', method='hash')¶ Join columns with other DataFrame on index or on a key column.
- Parameters
- otherDataFrame
- howstr
Only accepts “left”, “right”, “inner”, “outer”
- lsuffix, rsuffixstr
The suffices to add to the left (lsuffix) and right (rsuffix) column names when avoiding conflicts.
- sortbool
Set to True to ensure sorted ordering.
- Returns
- joinedDataFrame
Notes
Difference from pandas:
other must be a single DataFrame for now.
on is not supported yet due to lack of multi-index support.
-
keys()¶ Get the columns. This is index for Series, columns for DataFrame.
- Returns
- Index
Columns of DataFrame.
Examples
>>> import cudf >>> df = cudf.DataFrame({'one' : [1, 2, 3], 'five' : ['a', 'b', 'c']}) >>> df one five 0 1 a 1 2 b 2 3 c >>> df.keys() Index(['one', 'five'], dtype='object') >>> df = cudf.DataFrame(columns=[0, 1, 2, 3]) >>> df Empty DataFrame Columns: [0, 1, 2, 3] Index: [] >>> df.keys() Int64Index([0, 1, 2, 3], dtype='int64')
-
kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return Fisher’s unbiased kurtosis of a sample.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- Returns
- Series
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.kurt() a -1.2 b -1.2 dtype: float64
-
kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return Fisher’s unbiased kurtosis of a sample.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- Returns
- Series
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.kurt() a -1.2 b -1.2 dtype: float64
-
label_encoding(column, prefix, cats, prefix_sep='_', dtype=None, na_sentinel=- 1)¶ Encode labels in a column with label encoding.
- Parameters
- columnstr
the source column with binary encoding for the data.
- prefixstr
the new column name prefix.
- catssequence of ints
the sequence of categories as integers.
- prefix_sepstr
the separator between the prefix and the category.
- dtype :
the dtype for the outputs; see Series.label_encoding
- na_sentinelnumber
Value to indicate missing category.
- Returns
- a new dataframe with a new column append for the coded values.
-
property
loc¶ Selecting rows and columns by label or boolean mask.
See also
Notes
One notable difference from Pandas is when DataFrame is of mixed types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series yet.
Mixed dtype single row output as a dataframe (pandas results in Series)
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3], "b":["a", "b", "c"]}) >>> df.loc[0] a b 0 1 a
Examples
DataFrame with string index.
>>> print(df) a b a 0 5 b 1 6 c 2 7 d 3 8 e 4 9
Select a single row by label.
>>> print(df.loc['a']) a 0 b 5 Name: a, dtype: int64
Select multiple rows and a single column.
>>> print(df.loc[['a', 'c', 'e'], 'b']) a 5 c 7 e 9 Name: b, dtype: int64
Selection by boolean mask.
>>> print(df.loc[df.a > 2]) a b d 3 8 e 4 9
Setting values using loc.
>>> df.loc[['a', 'c', 'e'], 'a'] = 0 >>> print(df) a b a 0 5 b 1 6 c 0 7 d 3 8 e 0 9
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)¶ Return the maximum of the values in the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- dtype: data type
Data type to cast the result to.
- Returns
- Series
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.max() a 4 b 10 dtype: int64
-
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- meanSeries or DataFrame (if level specified)
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.mean() a 2.5 b 8.5 dtype: float64
-
melt(**kwargs)¶ Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
- Parameters
- frameDataFrame
- id_varstuple, list, or ndarray, optional
Column(s) to use as identifier variables. default: None
- value_varstuple, list, or ndarray, optional
Column(s) to unpivot. default: all columns that are not set as id_vars.
- var_namescalar
Name to use for the variable column. default: frame.columns.name or ‘variable’
- value_namestr
Name to use for the value column. default: ‘value’
- Returns
- outDataFrame
Melted result
-
memory_usage(index=True, deep=False)¶ Return the memory usage of each column in bytes. The memory usage can optionally include the contribution of the index and elements of object dtype.
- Parameters
- indexbool, default True
Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If
index=True, the memory usage of the index is the first item in the output.- deepbool, default False
If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.
- Returns
- Series
A Series whose index is the original column names and whose values is the memory usage of each column in bytes.
Examples
>>> dtypes = ['int64', 'float64', 'object', 'bool'] >>> data = dict([(t, np.ones(shape=5000).astype(t)) ... for t in dtypes]) >>> df = cudf.DataFrame(data) >>> df.head() int64 float64 object bool 0 1 1.0 1.0 True 1 1 1.0 1.0 True 2 1 1.0 1.0 True 3 1 1.0 1.0 True 4 1 1.0 1.0 True >>> df.memory_usage(index=False) int64 40000 float64 40000 object 40000 bool 5000 dtype: int64 Use a Categorical for efficient storage of an object-dtype column with many repeated values. >>> df['object'].astype('category').memory_usage(deep=True) 5048
-
merge(right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, how='inner', sort=False, lsuffix=None, rsuffix=None, type='', method='hash', indicator=False, suffixes='_x', '_y')¶ Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.
- Parameters
- rightDataFrame
- onlabel or list; defaults to None
Column or index level names to join on. These must be found in both DataFrames.
If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
- how{‘left’, ‘outer’, ‘inner’}, default ‘inner’
Type of merge to be performed.
left : use only keys from left frame, similar to a SQL left outer join.
right : not supported.
outer : use union of keys from both frames, similar to a SQL full outer join.
inner: use intersection of keys from both frames, similar to a SQL inner join.
- left_onlabel or list, or array-like
Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.
- right_onlabel or list, or array-like
Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.
- left_indexbool, default False
Use the index from the left DataFrame as the join key(s).
- right_indexbool, default False
Use the index from the right DataFrame as the join key.
- sortbool, default False
Sort the resulting dataframe by the columns that were merged on, starting from the left.
- suffixes: Tuple[str, str], defaults to (‘_x’, ‘_y’)
Suffixes applied to overlapping column names on the left and right sides
- method{‘hash’, ‘sort’}, default ‘hash’
The implementation method to be used for the operation.
- Returns
- mergedDataFrame
Notes
DataFrames merges in cuDF result in non-deterministic row ordering.
Examples
>>> import cudf >>> df_a = cudf.DataFrame() >>> df_a['key'] = [0, 1, 2, 3, 4] >>> df_a['vals_a'] = [float(i + 10) for i in range(5)] >>> df_b = cudf.DataFrame() >>> df_b['key'] = [1, 2, 4] >>> df_b['vals_b'] = [float(i+10) for i in range(3)] >>> df_merged = df_a.merge(df_b, on=['key'], how='left') >>> df_merged.sort_values('key') key vals_a vals_b 3 0 10.0 0 1 11.0 10.0 1 2 12.0 11.0 4 3 13.0 2 4 14.0 12.0
-
min(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)¶ Return the minimum of the values in the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- dtype: data type
Data type to cast the result to.
- Returns
- Series
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.min() a 1 b 7 dtype: int64
-
mod(other, axis='columns', level=None, fill_value=None)¶ Get Modulo division of dataframe and other, element-wise (binary operator mod).
Equivalent to
dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df % 100 angles degrees circle 0 60 triangle 3 80 rectangle 4 60 >>> df.mod(100) angles degrees circle 0 60 triangle 3 80 rectangle 4 60
-
mul(other, axis='columns', level=None, fill_value=None)¶ Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to
dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> other = pd.DataFrame({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> df * other angles degrees circle 0 null triangle 9 null rectangle 16 null >>> df.mul(other, fill_value=0) angles degrees circle 0 0 triangle 9 0 rectangle 16 0
-
nans_to_nulls()¶ Convert nans (if any) to nulls.
-
property
ndim¶ Dimension of the data. DataFrame ndim is always 2.
-
nlargest(n, columns, keep='first')¶ Get the rows of the DataFrame sorted by the n largest value of columns
Notes
- Difference from pandas:
Only a single column is supported in columns
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
nsmallest(n, columns, keep='first')¶ Get the rows of the DataFrame sorted by the n smallest value of columns
Notes
- Difference from pandas:
Only a single column is supported in columns
-
one_hot_encoding(column, prefix, cats, prefix_sep='_', dtype='float64')¶ Expand a column with one-hot-encoding.
- Parameters
- columnstr
the source column with binary encoding for the data.
- prefixstr
the new column name prefix.
- catssequence of ints
the sequence of categories as integers.
- prefix_sepstr
the separator between the prefix and the category.
- dtype :
the dtype for the outputs; defaults to float64.
- Returns
- a new dataframe with new columns append for each category.
Examples
>>> import pandas as pd >>> import cudf >>> pet_owner = [1, 2, 3, 4, 5] >>> pet_type = ['fish', 'dog', 'fish', 'bird', 'fish'] >>> df = pd.DataFrame({'pet_owner': pet_owner, 'pet_type': pet_type}) >>> df.pet_type = df.pet_type.astype('category')
Create a column with numerically encoded category values
>>> df['pet_codes'] = df.pet_type.cat.codes >>> gdf = cudf.from_pandas(df)
Create the list of category codes to use in the encoding
>>> codes = gdf.pet_codes.unique() >>> gdf.one_hot_encoding('pet_codes', 'pet_dummy', codes).head() pet_owner pet_type pet_codes pet_dummy_0 pet_dummy_1 pet_dummy_2 0 1 fish 2 0.0 0.0 1.0 1 2 dog 1 0.0 1.0 0.0 2 3 fish 2 0.0 0.0 1.0 3 4 bird 0 1.0 0.0 0.0 4 5 fish 2 0.0 0.0 1.0
-
partition_by_hash(columns, nparts, keep_index=True)¶ Partition the dataframe by the hashed value of data in columns.
- Parameters
- columnssequence of str
The names of the columns to be hashed. Must have at least one name.
- npartsint
Number of output partitions
- keep_indexboolean
Whether to keep the index or drop it
- Returns
- partitioned: list of DataFrame
-
pop(item)¶ Return a column and drop it from the DataFrame.
-
pow(other, axis='columns', level=None, fill_value=None)¶ Get Exponential power of dataframe and other, element-wise (binary operator pow).
Equivalent to
dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [1, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df ** 2 angles degrees circle 0 129600 triangle 9 32400 rectangle 16 129600 >>> df.pow(2) angles degrees circle 0 129600 triangle 9 32400 rectangle 16 129600
-
prod(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return product of the values in the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- dtype: data type
Data type to cast the result to.
- min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.prod() a 24 b 5040 dtype: int64
-
product(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return product of the values in the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- dtype: data type
Data type to cast the result to.
- min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- Series
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.product() a 24 b 5040 dtype: int64
-
quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear', columns=None, exact=True)¶ Return values at the given quantile.
- Parameters
- qfloat or array-like
0 <= q <= 1, the quantile(s) to compute
- axisint
axis is a NON-FUNCTIONAL parameter
- numeric_onlyboolean
numeric_only is a NON-FUNCTIONAL parameter
- interpolation{linear, lower, higher, midpoint, nearest}
This parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j. Default
linear.- columnslist of str
List of column names to include.
- exactboolean
Whether to use approximate or exact quantile algorithm.
- Returns
- DataFrame
-
quantiles(q=0.5, interpolation='nearest')¶ Return values at the given quantile.
- Parameters
- qfloat or array-like
0 <= q <= 1, the quantile(s) to compute
- interpolation{lower, higher, nearest}
This parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j. Default ‘nearest’.
- Returns
- DataFrame
-
query(expr, local_dict={})¶ Query with a boolean expression using Numba to compile a GPU kernel.
See pandas.DataFrame.query.
- Parameters
- exprstr
A boolean expression. Names in expression refer to columns. index can be used instead of index name, but this is not supported for MultiIndex.
Names starting with @ refer to Python variables.
An output value will be null if any of the input values are null regardless of expression.
- local_dictdict
Containing the local variable to be used in query.
- Returns
- filteredDataFrame
Examples
>>> import cudf >>> a = ('a', [1, 2, 2]) >>> b = ('b', [3, 4, 5]) >>> df = cudf.DataFrame([a, b]) >>> expr = "(a == 2 and b == 4) or (b == 3)" >>> print(df.query(expr)) a b 0 1 3 1 2 4
DateTime conditionals:
>>> import numpy as np >>> import datetime >>> df = cudf.DataFrame() >>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64') >>> df['datetimes'] = data >>> search_date = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d') >>> print(df.query('datetimes==@search_date')) datetimes 1 2018-10-08T00:00:00.000
Using local_dict:
>>> import numpy as np >>> import datetime >>> df = cudf.DataFrame() >>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64') >>> df['datetimes'] = data >>> search_date2 = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d') >>> print(df.query('datetimes==@search_date', >>> local_dict={'search_date':search_date2})) datetimes 1 2018-10-08T00:00:00.000
-
radd(other, axis=1, level=None, fill_value=None)¶ Get Addition of dataframe and other, element-wise (binary operator radd).
Equivalent to
other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df + 1 angles degrees circle 1 361 triangle 4 181 rectangle 5 361 >>> df.radd(1) angles degrees circle 1 361 triangle 4 181 rectangle 5 361
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rdiv(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to
other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360 >>> df.rtruediv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778 >>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778 >>> 10 / df angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
-
reindex(labels=None, axis=0, index=None, columns=None, copy=True)¶ Return a new DataFrame whose axes conform to a new index
DataFrame.reindexsupports two calling conventions:(index=index_labels, columns=column_names)(labels, axis={0 or 'index', 1 or 'columns'})
- Parameters
- labelsIndex, Series-convertible, optional, default None
- axis{0 or ‘index’, 1 or ‘columns’}, optional, default 0
- indexIndex, Series-convertible, optional, default None
Shorthand for
df.reindex(labels=index_labels, axis=0)- columnsarray-like, optional, default None
Shorthand for
df.reindex(labels=column_names, axis=1)- copyboolean, optional, default True
- Returns
- A DataFrame whose axes conform to the new index(es)
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2, 3, 4] >>> df['val'] = [float(i + 10) for i in range(5)] >>> df_new = df.reindex(index=[0, 3, 4, 5], ... columns=['key', 'val', 'sum']) >>> print(df) key val 0 0 10.0 1 1 11.0 2 2 12.0 3 3 13.0 4 4 14.0 >>> print(df_new) key val sum 0 0 10.0 NaN 3 3 13.0 NaN 4 4 14.0 NaN 5 -1 NaN NaN
-
rename(mapper=None, index=None, columns=None, axis=0, copy=True, inplace=False, level=None, errors='ignore')¶ Alter column and index labels.
Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.
DataFrame.renamesupports two calling conventions:(index=index_mapper, columns=columns_mapper, ...)(mapper, axis={0/'index' or 1/'column'}, ...)
We highly recommend using keyword arguments to clarify your intent.
- Parameters
- mapperdict-like or function, default None
optional dict-like or functions transformations to apply to the index/column values depending on selected
axis.- indexdict-like, default None
Optional dict-like transformations to apply to the index axis’ values. Does not support functions for axis 0 yet.
- columnsdict-like or function, default None
optional dict-like or functions transformations to apply to the columns axis’ values.
- axisint, default 0
Axis to rename with mapper. 0 or ‘index’ for index 1 or ‘columns’ for columns
- copyboolean, default True
Also copy underlying data
- inplaceboolean, default False
Return new DataFrame. If True, assign columns without copy
- levelint or level name, default None
In case of a MultiIndex, only rename labels in the specified level.
- errors{‘raise’, ‘ignore’, ‘warn’}, default ‘ignore’
Only ‘ignore’ supported Control raising of exceptions on invalid data for provided dtype.
raise: allow exceptions to be raisedignore: suppress exceptions. On error return original object.warn: prints last exceptions as warnings and return original object.
- Returns
- DataFrame
Notes
- Difference from pandas:
Not supporting: level
Rename will not overwite column names. If a list with duplicates is passed, column names will be postfixed with a number.
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method=None)¶ Replace values given in to_replace with replacement.
- Parameters
- to_replacenumeric, str, list-like or dict
Value(s) to replace.
numeric or str:
values equal to to_replace will be replaced with replacement
list of numeric or str:
If replacement is also list-like, to_replace and replacement must be of same length.
dict:
Dicts can be used to replace different values in different columns. For example, {‘a’: 1, ‘z’: 2} specifies that the value 1 in column a and the value 2 in column z should be replaced with replacement*.
- valuenumeric, str, list-like, or dict
Value(s) to replace to_replace with. If a dict is provided, then its keys must match the keys in to_replace, and corresponding values must be compatible (e.g., if they are lists, then they must match in length).
- inplacebool, default False
If True, in place.
- Returns
- resultDataFrame
DataFrame after replacement.
Notes
Parameters that are currently not supported are: limit, regex, method
Examples
>>> import cudf >>> gdf = cudf.DataFrame() >>> gdf['id']= [0, 1, 2, -1, 4, -1, 6] >>> gdf['id']= gdf['id'].replace(-1, None) >>> gdf id 0 0 1 1 2 2 3 null 4 4 5 null 6 6
-
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ Reset the index.
Reset the index of the DataFrame, and use the default one instead.
- Parameters
- dropbool, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplacebool, default False
Modify the DataFrame in place (do not create a new object).
- Returns
- DataFrame or None
DataFrame with the new index or None if
inplace=True.
Examples
>>> df = cudf.DataFrame([('bird', 389.0), ... ('bird', 24.0), ... ('mammal', 80.5), ... ('mammal', np.nan)], ... index=['falcon', 'parrot', 'lion', 'monkey'], ... columns=('class', 'max_speed')) >>> df class max_speed falcon bird 389.0 parrot bird 24.0 lion mammal 80.5 monkey mammal null >>> df.reset_index() index class max_speed 0 falcon bird 389.0 1 parrot bird 24.0 2 lion mammal 80.5 3 monkey mammal null >>> df.reset_index(drop=True) class max_speed 0 bird 389.0 1 bird 24.0 2 mammal 80.5 3 mammal null
-
rfloordiv(other, axis='columns', level=None, fill_value=None)¶ Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
Equivalent to
other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'col1': [10, 11, 23], ... 'col2': [101, 122, 321]}) >>> df col1 col2 0 10 101 1 11 122 2 23 321 >>> df.rfloordiv(df) col1 col2 0 1 1 1 1 1 2 1 1 >>> df.rfloordiv(200) col1 col2 0 20 1 1 18 1 2 8 0 >>> df.rfloordiv(100) col1 col2 0 10 0 1 9 0 2 4 0
-
rmod(other, axis='columns', level=None, fill_value=None)¶ Get Modulo division of dataframe and other, element-wise (binary operator rmod).
Equivalent to
other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [1, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> 100 % df angles degrees circle 0 100 triangle 1 100 rectangle 0 100 >>> df.rmod(100) angles degrees circle 0 100 triangle 1 100 rectangle 0 100
-
rmul(other, axis='columns', level=None, fill_value=None)¶ Get Multiplication of dataframe and other, element-wise (binary operator rmul).
Equivalent to
other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> other = pd.DataFrame({'angles': [0, 3, 4]}, ... index=['circle', 'triangle', 'rectangle']) >>> other * df angles degrees circle 0 null triangle 9 null rectangle 16 null >>> df.rmul(other, fill_value=0) angles degrees circle 0 0 triangle 9 0 rectangle 16 0
-
rolling(window, min_periods=None, center=False, axis=0, win_type=None)¶ Rolling window calculations.
- Parameters
- windowint or offset
Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset.
- min_periodsint, optional
The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or
None,min_periodsis equal to the window size.- centerbool, optional
If
True, the result is set at the center of the window. IfFalse(default), the result is set at the right edge of the window.
- Returns
Rollingobject.
Examples
>>> import cudf >>> a = cudf.Series([1, 2, 3, None, 4])
Rolling sum with window size 2.
>>> print(a.rolling(2).sum()) 0 1 3 2 5 3 4 dtype: int64
Rolling sum with window size 2 and min_periods 1.
>>> print(a.rolling(2, min_periods=1).sum()) 0 1 1 3 2 5 3 3 4 4 dtype: int64
Rolling count with window size 3.
>>> print(a.rolling(3).count()) 0 1 1 2 2 3 3 2 4 2 dtype: int64
Rolling count with window size 3, but with the result set at the center of the window.
>>> print(a.rolling(3, center=True).count()) 0 2 1 3 2 2 3 2 4 1 dtype: int64
Rolling max with variable window size specified by an offset; only valid for datetime index.
>>> a = cudf.Series( ... [1, 9, 5, 4, np.nan, 1], ... index=[ ... pd.Timestamp('20190101 09:00:00'), ... pd.Timestamp('20190101 09:00:01'), ... pd.Timestamp('20190101 09:00:02'), ... pd.Timestamp('20190101 09:00:04'), ... pd.Timestamp('20190101 09:00:07'), ... pd.Timestamp('20190101 09:00:08') ... ] ... )
>>> print(a.rolling('2s').max()) 2019-01-01T09:00:00.000 1 2019-01-01T09:00:01.000 9 2019-01-01T09:00:02.000 9 2019-01-01T09:00:04.000 4 2019-01-01T09:00:07.000 2019-01-01T09:00:08.000 1 dtype: int64
Apply custom function on the window with the apply method
>>> import numpy as np >>> import math >>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64) >>> def some_func(A): ... b = 0 ... for a in A: ... b = b + math.sqrt(a) ... return b ... >>> print(b.rolling(3, min_periods=1).apply(some_func)) 0 4.0 1 9.0 2 15.0 3 18.0 4 21.0 5 24.0 dtype: float64
And this also works for window rolling set by an offset
>>> import pandas as pd >>> c = cudf.Series( ... [16, 25, 36, 49, 64, 81], ... index=[ ... pd.Timestamp('20190101 09:00:00'), ... pd.Timestamp('20190101 09:00:01'), ... pd.Timestamp('20190101 09:00:02'), ... pd.Timestamp('20190101 09:00:04'), ... pd.Timestamp('20190101 09:00:07'), ... pd.Timestamp('20190101 09:00:08') ... ], ... dtype=np.float64 ... ) >>> print(c.rolling('2s').apply(some_func)) 2019-01-01T09:00:00.000 4.0 2019-01-01T09:00:01.000 9.0 2019-01-01T09:00:02.000 11.0 2019-01-01T09:00:04.000 7.0 2019-01-01T09:00:07.000 8.0 2019-01-01T09:00:08.000 17.0 dtype: float64
-
rpow(other, axis='columns', level=None, fill_value=None)¶ Get Exponential power of dataframe and other, element-wise (binary operator pow).
Equivalent to
other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [1, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> 1 ** df angles degrees circle 1 1 triangle 1 1 rectangle 1 1 >>> df.rpow(1) angles degrees circle 1 1 triangle 1 1 rectangle 1 1
-
rsub(other, axis='columns', level=None, fill_value=None)¶ Get Subtraction of dataframe and other, element-wise (binary operator rsub).
Equivalent to
other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360 >>> df.rsub(1) angles degrees circle 1 -359 triangle -2 -179 rectangle -3 -359 >>> df.rsub([1, 2]) angles degrees circle 1 -358 triangle -2 -178 rectangle -3 -358
-
rtruediv(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to
other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df angles degrees circle 0 360 triangle 3 180 rectangle 4 360 >>> df.rtruediv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778 >>> df.rdiv(10) angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778 >>> 10 / df angles degrees circle inf 0.027778 triangle 3.333333 0.055556 rectangle 2.500000 0.027778
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
select_dtypes(include=None, exclude=None)¶ Return a subset of the DataFrame’s columns based on the column dtypes.
- Parameters
- includestr or list
which columns to include based on dtypes
- excludestr or list
which columns to exclude based on dtypes
- Returns
- DataFrame
The subset of the frame including the dtypes in
includeand excluding the dtypes inexclude.
- Raises
- ValueError
If both of
includeandexcludeare emptyIf
includeandexcludehave overlapping elements
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2] * 3, ... 'b': [True, False] * 3, ... 'c': [1.0, 2.0] * 3}) >>> df a b c 0 1 True 1.0 1 2 False 2.0 2 1 True 1.0 3 2 False 2.0 4 1 True 1.0 5 2 False 2.0 >>> df.select_dtypes(include='bool') b 0 True 1 False 2 True 3 False 4 True 5 False >>> df.select_dtypes(include=['float64']) c 0 1.0 1 2.0 2 1.0 3 2.0 4 1.0 5 2.0 >>> df.select_dtypes(exclude=['int']) b c 0 True 1.0 1 False 2.0 2 True 1.0 3 False 2.0 4 True 1.0 5 False 2.0
-
set_index(index, drop=True)¶ Return a new DataFrame with a new index
- Parameters
- indexIndex, Series-convertible, str, or list of str
Index : the new index. Series-convertible : values for the new index. str : name of column to be used as series list of str : name of columns to be converted to a MultiIndex
- dropboolean
whether to drop corresponding column for str index argument
-
property
shape¶ Returns a tuple representing the dimensionality of the DataFrame.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased Fisher-Pearson skew of a sample.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- Returns
- Series
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 8, 10, 10]}) >>> df.skew() a 0.00000 b -0.37037 dtype: float64
-
sort_index(axis=0, level=None, ascending=True, inplace=False, kind=None, na_position='last', sort_remaining=True, ignore_index=False)¶ Sort object by labels (along an axis).
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
- levelint or level name or list of ints or list of level names
If not None, sort on values in specified index level(s). This is only useful in the case of MultiIndex.
- ascendingbool, default True
Sort ascending vs. descending.
- inplacebool, default False
If True, perform operation in-place.
- kindsorting method such as quick sort and others.
Not yet supported.
- na_position{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
- sort_remainingbool, default True
Not yet supported
- ignore_indexbool, default False
if True, index will be replaced with RangeIndex.
- Returns
- DataFrame or None
Examples
>>> df = cudf.DataFrame( ... {"b":[3, 2, 1], "a":[2, 1, 3]}, index=[1, 3, 2]) >>> df.sort_index(axis=0) b a 1 3 2 2 1 3 3 2 1 >>> df.sort_index(axis=1) a b 1 2 3 3 1 2 2 3 1
-
sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)¶ Sort by the values row-wise.
- Parameters
- bystr or list of str
Name or list of names to sort by.
- ascendingbool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
- na_position{‘first’, ‘last’}, default ‘last’
‘first’ puts nulls at the beginning, ‘last’ puts nulls at the end
- ignore_indexbool, default False
If True, index will not be sorted.
- Returns
- sorted_objcuDF DataFrame
Notes
- Difference from pandas:
Support axis=’index’ only.
Not supporting: inplace, kind
Examples
>>> import cudf >>> a = ('a', [0, 1, 2]) >>> b = ('b', [-3, 2, 0]) >>> df = cudf.DataFrame([a, b]) >>> print(df.sort_values('b')) a b 0 0 -3 2 2 0 1 1 2
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
stack(level=- 1, dropna=True)¶ Stack the prescribed level(s) from columns to index
Return a reshaped Series
- Parameters
- dropnabool, default True
Whether to drop rows in the resulting Series with missing values.
- Returns
- The stacked cudf.Series
Examples
>>> import cudf >>> df = cudf.DataFrame({'a':[0,1,3], 'b':[1,2,4]}) >>> df.stack() 0 a 0 b 1 1 a 1 b 2 2 a 3 b 4 dtype: int64
-
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return sample standard deviation of the DataFrame.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddof: int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- Returns
- Series
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.std() a 1.290994 b 1.290994 dtype: float64
-
sub(other, axis='columns', level=None, fill_value=None)¶ Get Subtraction of dataframe and other, element-wise (binary operator sub).
Equivalent to
dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df.sub(1) angles degrees circle -1 359 triangle 2 179 rectangle 3 359 >>> df.sub([1, 2]) angles degrees circle -1 358 triangle 2 178 rectangle 3 358
-
sum(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return sum of the values in the DataFrame.
- Parameters
- skipna: bool, default True
Exclude NA/null values when computing the result.
- dtype: data type
Data type to cast the result to.
- min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- Series
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.sum() a 10 b 34 dtype: int64
-
tail(n=5)¶ Returns the last n rows as a new DataFrame
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2, 3, 4] >>> df['val'] = [float(i + 10) for i in range(5)] # insert column >>> print(df.tail(2)) key val 3 3 13.0 4 4 14.0
-
take(positions, keep_index=True)¶ Return a new DataFrame containing the rows specified by positions
- Parameters
- positionsarray-like
Integer or boolean array-like specifying the rows of the output. If integer, each element represents the integer index of a row. If boolean, positions must be of the same length as self, and represents a boolean mask.
- Returns
- outDataFrame
New DataFrame
Examples
>>> a = cudf.DataFrame({'a': [1.0, 2.0, 3.0], ... 'b': cudf.Series(['a', 'b', 'c'])}) >>> a.take([0, 2, 2]) a b 0 1.0 a 2 3.0 c 2 3.0 c >>> a.take([True, False, True]) a b 0 1.0 a 2 3.0 c
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_arrow(preserve_index=True)¶ Convert to a PyArrow Table.
Examples
>>> import cudf >>> a = ('a', [0, 1, 2]) >>> b = ('b', [-3, 2, 0]) >>> df = cudf.DataFrame([a, b]) >>> df.to_arrow() pyarrow.Table None: int64 a: int64 b: int64
-
to_csv(path=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator='\n', chunksize=None)¶ Write a dataframe to csv file format.
- Parameters
- dfDataFrame
DataFrame object to be written to csv
- pathstr, default None
Path of file where DataFrame will be written
- sepchar, default ‘,’
Delimiter to be used.
- na_repstr, default ‘’
String to use for null entries
- columnslist of str, optional
Columns to write
- headerbool, default True
Write out the column names
- indexbool, default True
Write out the index as a column
- line_terminatorchar, default ‘n’
- chunksizeint or None, default None
Rows to write at a time
See also
Notes
Follows the standard of Pandas csv.QUOTE_NONNUMERIC for all output.
If to_csv leads to memory errors consider setting the chunksize argument.
Examples
Write a dataframe to csv.
>>> import cudf >>> filename = 'foo.csv' >>> df = cudf.DataFrame({'x': [0, 1, 2, 3], 'y': [1.0, 3.3, 2.2, 4.4], 'z': ['a', 'b', 'c', 'd']}) >>> df = df.set_index([3, 2, 1, 0]) >>> df.to_csv(filename)
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_feather(path, *args, **kwargs)¶ Write a DataFrame to the feather format.
- Parameters
- pathstr
File path
See also
-
to_gpu_matrix()¶ Convert to a numba gpu ndarray
- Returns
- numba gpu ndarray
-
to_hdf(path_or_buf, key, *args, **kwargs)¶ Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.
For more information see the user guide.
- Parameters
- path_or_bufstr or pandas.HDFStore
File path or HDFStore object.
- keystr
Identifier for the group in the store.
- mode{‘a’, ‘w’, ‘r+’}, default ‘a’
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- format{‘fixed’, ‘table’}, default ‘fixed’
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
- appendbool, default False
For Table formats, append the input data to the existing.
- data_columnslist of columns or True, optional
List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.
- complevel{0-9}, optional
Specifies a compression level for data. A value of 0 disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- fletcher32bool, default False
If applying compression use the fletcher32 checksum.
- dropnabool, default False
If true, ALL nan rows will not be written to store.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()for a full list of options.
See also
cudf.io.hdf.read_hdfRead from HDF file.
cudf.io.parquet.to_parquetWrite a DataFrame to the binary parquet format.
cudf.io.feather.to_featherWrite out feather-format for DataFrames.
-
to_json(path_or_buf=None, *args, **kwargs)¶ Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.
- Parameters
- path_or_bufstring or file handle, optional
File path or object. If not specified, the result is returned as a string.
- orientstring
Indication of expected JSON string format.
- Series
default is ‘index’
allowed values are: {‘split’,’records’,’index’,’table’}
- DataFrame
default is ‘columns’
allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}
- The format of the JSON string
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
‘columns’ : dict like {column -> {index -> value}}
‘values’ : just the values array
‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like
orient='records'.
- date_format{None, ‘epoch’, ‘iso’}
Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For
orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.- double_precisionint, default 10
The number of decimal places to use when encoding floating point values.
- force_asciibool, default True
Force encoded string to be ASCII.
- date_unitstring, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
- default_handlercallable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.
- linesbool, default False
If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.
- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.
- indexbool, default True
Whether to include the index values in the JSON string. Not including the index (
index=False) is only supported when orient is ‘split’ or ‘table’.
See also
-
to_orc(fname, compression=None, *args, **kwargs)¶ Write a DataFrame to the ORC format.
- Parameters
- fnamestr
File path or object where the ORC dataset will be stored.
- compression{{ ‘snappy’, None }}, default None
Name of the compression to use. Use None for no compression.
- enable_statistics: boolean, default True
Enable writing column statistics.
See also
-
to_pandas(**kwargs)¶ Convert to a Pandas DataFrame.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [0, 1, 2], 'b': [-3, 2, 0]}) >>> pdf = df.to_pandas() >>> pdf a b 0 0 -3 1 1 2 2 2 0 >>> type(pdf) <class 'pandas.core.frame.DataFrame'>
-
to_parquet(path, *args, **kwargs)¶ Write a DataFrame to the parquet format.
- Parameters
- pathstr
File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.
- compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
Name of the compression to use. Use
Nonefor no compression.- indexbool, default None
If
True, include the dataframe’s index(es) in the file output. IfFalse, they will not be written to the file. IfNone, the engine’s default behavior will be used.- partition_colslist, optional, default None
Column names by which to partition the dataset Columns are partitioned in the order they are given
-
to_records(index=True)¶ Convert to a numpy recarray
- Parameters
- indexbool
Whether to include the index in the output.
- Returns
- numpy recarray
-
to_string()¶ Convert to string
cuDF uses Pandas internals for efficient string formatting. Set formatting options using pandas string formatting options and cuDF objects will print identically to Pandas objects.
cuDF supports null/None as a value in any column type, which is transparently supported during this output process.
Examples
>>> import cudf >>> df = cudf.DataFrame() >>> df['key'] = [0, 1, 2] >>> df['val'] = [float(i + 10) for i in range(3)] >>> df.to_string() ' key val\n0 0 10.0\n1 1 11.0\n2 2 12.0'
-
transpose()¶ Transpose index and columns.
- Returns
- a new (ncol x nrow) dataframe. self is (nrow x ncol)
Notes
Difference from pandas: Not supporting copy because default and only behavior is copy=True
-
truediv(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to
dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
Examples
>>> import cudf >>> df = cudf.DataFrame({'angles': [0, 3, 4], ... 'degrees': [360, 180, 360]}, ... index=['circle', 'triangle', 'rectangle']) >>> df.truediv(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0 >>> df.div(10) angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0 >>> df / 10 angles degrees circle 0.0 36.0 triangle 0.3 18.0 rectangle 0.4 36.0
-
property
values¶ Return a CuPy representation of the DataFrame.
Only the values in the DataFrame will be returned, the axes labels will be removed.
- Returns
- out: cupy.ndarray
The values of the DataFrame.
-
var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return unbiased variance of the DataFrame.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- skipna: bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddof: int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.var() a 1.666667 b 1.666667 dtype: float64
-
where(cond, other=None, inplace=False)¶ Replace values where the condition is False.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.where(df % 2 == 0, [-1, -1]) A B 0 -1 -1 1 4 -1 2 -1 8
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.where(ser > 2, 10) 0 4 1 3 2 10 3 10 4 10 dtype: int64 >>> ser.where(ser > 2) 0 4 1 3 2 null 3 null 4 null dtype: int64
Series¶
-
class
cudf.core.series.Series(data=None, index=None, dtype=None, name=None, nan_as_null=True)¶ One-dimensional GPU array (including time series).
Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as null/NaN).
Operations between Series (+, -, /, *, **) align values based on their associated index values-– they need not be the same length. The result index will be the sorted union of the two indexes.
Seriesobjects are used as columns ofDataFrame.- Parameters
- dataarray-like, Iterable, dict, or scalar value
Contains data stored in Series.
- indexarray-like or Index (1d)
Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
- dtypestr, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be inferred from data.
- namestr, optional
The name to give to the Series.
- nan_as_nullbool, Default True
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Attributes
catAccessor object for categorical properties of the Series values.
dataThe gpu buffer for the data
dtAccessor object for datetimelike properties of the Series values.
dtypedtype of the Series
emptyIndicator whether DataFrame or Series is empty.
has_nullsIndicator whether Series contains null values.
ilocSelect values by position.
indexThe index object
is_monotonicReturn boolean if values in the object are monotonic_increasing.
is_monotonic_decreasingReturn boolean if values in the object are monotonic_decreasing.
is_monotonic_increasingReturn boolean if values in the object are monotonic_increasing.
is_uniqueReturn boolean if values in the object are unique.
- list
locSelect values by label.
nameReturns name of the Series.
ndimDimension of the data.
null_countNumber of null values
nullableA boolean indicating whether a null-mask is needed
nullmaskThe gpu buffer for the null-mask
shapeReturns a tuple representing the dimensionality of the Series.
sizeReturn the number of elements in the underlying data.
strVectorized string functions for Series and Index.
valid_countNumber of non-null values
valuesReturn a CuPy representation of the Series.
values_hostReturn a numpy representation of the Series.
Methods
abs()Absolute value of each element of the series.
acos()Get Trigonometric inverse cosine, element-wise.
add(other[, fill_value, axis])Addition of series and other, element-wise (binary operator add).
all([axis, bool_only, skipna, level])Return whether all elements are True in Series.
any([axis, bool_only, skipna, level])Return whether any elements is True in Series.
append(to_append[, ignore_index, …])Append values from another
Seriesor array-like object.applymap(udf[, out_dtype])Apply an elementwise function to transform the values in the Column.
argsort([ascending, na_position])Returns a Series of int64 index that will sort the series.
as_index()Returns a new Series with a RangeIndex.
as_mask()Convert booleans to bitmask
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy, errors])Cast the Series to the given dtype
atan()Get Trigonometric inverse tangent, element-wise.
ceil()Rounds each value upward to the smallest integral value not less than the original.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object’s indices and data.
corr(other[, method, min_periods])Calculates the sample correlation between two Series, excluding missing values.
cos()Get Trigonometric cosine, element-wise.
count([level])Return number of non-NA/null observations in the Series
cov(other[, min_periods])Compute covariance with Series, excluding missing values.
cummax([axis, skipna])Return cumulative maximum of the Series.
cummin([axis, skipna])Return cumulative minimum of the Series.
cumprod([axis, skipna])Return cumulative product of the Series.
cumsum([axis, skipna])Return cumulative sum of the Series.
describe([percentiles, include, exclude])Compute summary statistics of a Series.
diff([periods])Calculate the difference between values at positions i and i - N in an array and store the output in a new array.
digitize(bins[, right])Return the indices of the bins to which each value in series belongs.
drop_duplicates([keep, inplace, ignore_index])Return Series with duplicate values removed
dropna([axis, inplace, how])Return a Series with null values removed.
eq(other[, fill_value, axis])Equal to of series and other, element-wise (binary operator eq).
equals(other)Test whether two objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
factorize([na_sentinel])Encode the input values as integer labels
fillna(value[, method, axis, inplace, limit])Fill null values with
value.floor()Rounds each value downward to the largest integral value not greater than the original.
floordiv(other[, fill_value, axis])Integer division of series and other, element-wise (binary operator floordiv).
from_arrow(s)Convert from a PyArrow Array.
from_categorical(categorical[, codes])Creates from a pandas.Categorical
from_masked_array(data, mask[, null_count])Create a Series with null-mask.
from_pandas(s[, nan_as_null])Convert from a Pandas Series.
ge(other[, fill_value, axis])Greater than or equal to of series and other, element-wise (binary operator ge).
groupby([by, group_series, level, sort, …])Group Series using a mapper or by a Series of columns.
gt(other[, fill_value, axis])Greater than of series and other, element-wise (binary operator gt).
hash_encode(stop[, use_name])Encode column values as ints in [0, stop) using hash function.
Compute the hash of values in this column.
head([n])Return the first n rows.
Interleave Series columns of a table into a single column.
isin(values)Check whether values are contained in Series.
isna()Identify missing values.
isnull()Identify missing values.
keys()Return alias for index.
kurt([axis, skipna, level, numeric_only])Return Fisher’s unbiased kurtosis of a sample.
kurtosis([axis, skipna, level, numeric_only])Return Fisher’s unbiased kurtosis of a sample.
label_encoding(cats[, dtype, na_sentinel])Perform label encoding
le(other[, fill_value, axis])Less than or equal to of series and other, element-wise (binary operator le).
log()Get the natural logarithm of all elements, element-wise.
lt(other[, fill_value, axis])Less than of series and other, element-wise (binary operator lt).
mask(cond[, other, inplace])Replace values where the condition is True.
max([axis, skipna, dtype, level, numeric_only])Return the maximum of the values in the Series.
mean([axis, skipna, level, numeric_only])Return the mean of the values in the series.
median([skipna])Compute the median of the series
memory_usage([index, deep])Return the memory usage of the Series.
min([axis, skipna, dtype, level, numeric_only])Return the minimum of the values in the Series.
mod(other[, fill_value, axis])Modulo of series and other, element-wise (binary operator mod).
mul(other[, fill_value, axis])Multiplication of series and other, element-wise (binary operator mul).
Convert nans (if any) to nulls
ne(other[, fill_value, axis])Not equal to of series and other, element-wise (binary operator ne).
nlargest([n, keep])Returns a new Series of the n largest element.
notna()Identify non-missing values.
notnull()Identify non-missing values.
nsmallest([n, keep])Returns a new Series of the n smallest element.
nunique([method, dropna])Returns the number of unique values of the Series: approximate version, and exact version to be moved to libgdf
one_hot_encoding(cats[, dtype])Perform one-hot-encoding
pow(other[, fill_value, axis])Exponential power of series and other, element-wise (binary operator pow).
prod([axis, skipna, dtype, level, …])Return product of the values in the series
product([axis, skipna, dtype, level, …])Return product of the values in the Series.
quantile([q, interpolation, exact, quant_index])Return values at the given quantile.
radd(other[, fill_value, axis])Addition of series and other, element-wise (binary operator radd).
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
reindex([index, copy])Return a Series that conforms to a new index
rename([index, copy])Alter Series name
repeat(repeats[, axis])Repeats elements consecutively.
replace([to_replace, value, inplace, limit, …])Replace values given in
to_replacewithvalue.reset_index([drop, inplace])Reset index to RangeIndex
reverse()Reverse the Series
rfloordiv(other[, fill_value, axis])Integer division of series and other, element-wise (binary operator rfloordiv).
rmod(other[, fill_value, axis])Modulo of series and other, element-wise (binary operator rmod).
rmul(other[, fill_value, axis])Multiplication of series and other, element-wise (binary operator rmul).
rolling(window[, min_periods, center, axis, …])Rolling window calculations.
round([decimals])Round a Series to a configurable number of decimal places.
rpow(other[, fill_value, axis])Exponential power of series and other, element-wise (binary operator rpow).
rsub(other[, fill_value, axis])Subtraction of series and other, element-wise (binary operator rsub).
rtruediv(other[, fill_value, axis])Floating division of series and other, element-wise (binary operator rtruediv).
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scale()Scale values to [0, 1] in float64
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
set_index(index)Returns a new Series with a different index.
set_mask(mask[, null_count])Create new Series by setting a mask array.
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
skew([axis, skipna, level, numeric_only])Return unbiased Fisher-Pearson skew of a sample.
sort_index([ascending])Sort by the index.
sort_values([axis, ascending, inplace, …])Sort by the values.
sqrt()Get the non-negative square-root of all elements, element-wise.
std([axis, skipna, level, ddof, numeric_only])Return sample standard deviation of the Series.
sub(other[, fill_value, axis])Subtraction of series and other, element-wise (binary operator sub).
sum([axis, skipna, dtype, level, …])Return sum of the values in the Series.
tail([n])Returns the last n rows as a new Series
take(indices[, keep_index])Return Series by taking values from the corresponding indices.
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Series to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([name])Convert Series into a DataFrame
to_gpu_array([fillna])Get a dense numba device array for the data.
to_hdf(path_or_buf, key, *args, **kwargs)Write the contained data to an HDF5 file using HDFStore.
to_json([path_or_buf])Convert the cuDF object to a JSON string.
to_pandas([index])Convert to a Pandas Series.
Convert to string
truediv(other[, fill_value, axis])Floating division of series and other, element-wise (binary operator truediv).
unique()Returns unique values of this Series.
value_counts([normalize, sort, ascending, …])Return a Series containing counts of unique values.
values_to_string([nrows])Returns a list of string for each element.
var([axis, skipna, level, ddof, numeric_only])Return unbiased variance of the Series.
where(cond[, other, inplace])Replace values where the condition is False.
-
abs()¶ Absolute value of each element of the series.
Returns a new Series.
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
add(other, fill_value=None, axis=0)¶ Addition of series and other, element-wise (binary operator add).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True in Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- Returns
- scalar
Notes
Parameters currently not supported are axis, bool_only, level.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.all() True
-
any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any elements is True in Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- Returns
- scalar
Notes
Parameters currently not supported are axis, bool_only, level.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.any() True
-
append(to_append, ignore_index=False, verify_integrity=False)¶ Append values from another
Seriesor array-like object. Ifignore_index=True, the index is reset.- Parameters
- to_appendSeries or list/tuple of Series
Series to append with self.
- ignore_indexboolean, default False.
If True, do not use the index.
- verify_integritybool, default False
This Parameter is currently not supported.
- Returns
- Series
A new concatenated series
See also
cudf.concatGeneral function to concatenate DataFrame or Series objects.
Examples
>>> import cudf >>> s1 = cudf.Series([1, 2, 3]) >>> s2 = cudf.Series([4, 5, 6]) >>> s1 0 1 1 2 2 3 dtype: int64 >>> s2 0 4 1 5 2 6 dtype: int64 >>> s1.append(s2) 0 1 1 2 2 3 0 4 1 5 2 6 dtype: int64
>>> s3 = cudf.Series([4, 5, 6], index=[3, 4, 5]) >>> s3 3 4 4 5 5 6 dtype: int64 >>> s1.append(s3) 0 1 1 2 2 3 3 4 4 5 5 6 dtype: int64
With ignore_index set to True:
>>> s1.append(s2, ignore_index=True) 0 1 1 2 2 3 3 4 4 5 5 6 dtype: int64
-
applymap(udf, out_dtype=None)¶ Apply an elementwise function to transform the values in the Column.
The user function is expected to take one argument and return the result, which will be stored to the output Series. The function cannot reference globals except for other simple scalar objects.
- Parameters
- udffunction
Either a callable python function or a python function already decorated by
numba.cuda.jitfor call on the GPU as a device- out_dtypenumpy.dtype; optional
The dtype for use in the output. Only used for
numba.cuda.jitdecorated udf. By default, the result will have the same dtype as the source.
- Returns
- resultSeries
The mask and index are preserved.
Notes
The supported Python features are listed in
with these exceptions:
Math functions in cmath are not supported since libcudf does not have complex number support and output of cmath functions are most likely complex numbers.
These five functions in math are not supported since numba generates multiple PTX functions from them
math.sin()
math.cos()
math.tan()
math.gamma()
math.lgamma()
Series with string dtypes are not supported in applymap method.
Global variables need to be re-defined explicitly inside the udf, as numba considers them to be compile-time constants and there is no known way to obtain value of the global variable.
Examples
Returning a Series of booleans using only a literal pattern.
>>> import cudf >>> s = cudf.Series([1, 10, -10, 200, 100]) >>> s.applymap(lambda x: x) 0 1 1 10 2 -10 3 200 4 100 dtype: int64 >>> s.applymap(lambda x: x in [1, 100, 59]) 0 True 1 False 2 False 3 False 4 True dtype: bool >>> s.applymap(lambda x: x ** 2) 0 1 1 100 2 100 3 40000 4 10000 dtype: int64 >>> s.applymap(lambda x: (x ** 2) + (x / 2)) 0 1.5 1 105.0 2 95.0 3 40100.0 4 10050.0 dtype: float64 >>> def cube_function(a): ... return a ** 3 ... >>> s.applymap(cube_function) 0 1 1 1000 2 -1000 3 8000000 4 1000000 dtype: int64 >>> def custom_udf(x): ... if x > 0: ... return x + 5 ... else: ... return x - 5 ... >>> s.applymap(custom_udf) 0 6 1 15 2 -15 3 205 4 105 dtype: int64
-
argsort(ascending=True, na_position='last')¶ Returns a Series of int64 index that will sort the series.
Uses Thrust sort.
- Returns
- result: Series
-
as_index()¶ Returns a new Series with a RangeIndex.
Examples
>>> s = cudf.Series([1,2,3], index=['a','b','c']) >>> s a 1 b 2 c 3 dtype: int64 >>> s.as_index() 0 1 1 2 2 3 dtype: int64
-
as_mask()¶ Convert booleans to bitmask
- Returns
- device array
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False, errors='raise')¶ Cast the Series to the given dtype
- Parameters
- dtypedata type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast Series object to the same type. Alternatively, use {col: dtype, …}, where col is a series name and dtype is a numpy.dtype or Python type to cast to.
- copybool, default False
Return a deep-copy when
copy=True. Note by defaultcopy=Falsesetting is used and hence changes to values then may propagate to other cudf objects.- errors{‘raise’, ‘ignore’, ‘warn’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype. -
raise: allow exceptions to be raised -ignore: suppress exceptions. On error return original object. -warn: prints last exceptions as warnings and return original object.
- Returns
- outSeries
Returns
self.copy(deep=copy)ifdtypeis the same asself.dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
property
cat¶ Accessor object for categorical properties of the Series values. Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default.
- Parameters
- dataSeries or CategoricalIndex
Examples
>>> s = cudf.Series([1,2,3], dtype='category') >>> s >>> s 0 1 1 2 2 3 dtype: category Categories (3, int64): [1, 2, 3] >>> s.cat.categories Int64Index([1, 2, 3], dtype='int64') >>> s.cat.reorder_categories([3,2,1]) 0 1 1 2 2 3 dtype: category Categories (3, int64): [3, 2, 1] >>> s.cat.remove_categories([1]) 0 null 1 2 2 3 dtype: category Categories (2, int64): [2, 3] >>> s.cat.set_categories(list('abcde')) 0 null 1 null 2 null dtype: category Categories (5, object): [a, b, c, d, e] >>> s.cat.as_ordered() 0 1 1 2 2 3 dtype: category Categories (3, int64): [1 < 2 < 3] >>> s.cat.as_unordered() 0 1 1 2 2 3 dtype: category Categories (3, int64): [1, 2, 3]
-
ceil()¶ Rounds each value upward to the smallest integral value not less than the original.
Returns a new Series.
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object’s indices and data.
When
deep=True(default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below). Whendeep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).- Parameters
- deepbool, default True
Make a deep copy, including a copy of the data and the indices. With
deep=Falseneither the indices nor the data are copied.
- Returns
- copySeries or DataFrame
Object type matches caller.
Examples
>>> s = cudf.Series([1, 2], index=["a", "b"]) >>> s a 1 b 2 dtype: int64 >>> s_copy = s.copy() >>> s_copy a 1 b 2 dtype: int64
Shallow copy versus default (deep) copy:
>>> s = cudf.Series([1, 2], index=["a", "b"]) >>> deep = s.copy() >>> shallow = s.copy(deep=False)
Shallow copy shares data and index with original.
>>> s is shallow False >>> s._column is shallow._column and s.index is shallow.index True
Deep copy has own copy of data and index.
>>> s is deep False >>> s.values is deep.values or s.index is deep.index False
Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.
>>> s['a'] = 3 >>> shallow['b'] = 4 >>> s a 3 b 4 dtype: int64 >>> shallow a 3 b 4 dtype: int64 >>> deep a 1 b 2 dtype: int64
-
corr(other, method='pearson', min_periods=None)¶ Calculates the sample correlation between two Series, excluding missing values.
Examples
>>> import cudf >>> ser1 = cudf.Series([0.9, 0.13, 0.62]) >>> ser2 = cudf.Series([0.12, 0.26, 0.51]) >>> ser1.corr(ser2) -0.20454263717316112
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
count(level=None, **kwargs)¶ Return number of non-NA/null observations in the Series
- Returns
- int
Number of non-null values in the Series.
Notes
Parameters currently not supported is level.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.count() 5
-
cov(other, min_periods=None)¶ Compute covariance with Series, excluding missing values.
- Parameters
- otherSeries
Series with which to compute the covariance.
- Returns
- float
Covariance between Series and other normalized by N-1 (unbiased estimator).
Notes
min_periods parameter is not yet supported.
Examples
>>> import cudf >>> ser1 = cudf.Series([0.9, 0.13, 0.62]) >>> ser2 = cudf.Series([0.12, 0.26, 0.51]) >>> ser1.cov(ser2) -0.015750000000000004
-
cummax(axis=0, skipna=True, *args, **kwargs)¶ Return cumulative maximum of the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- Series
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.cummax() 0 1 1 5 2 5 3 5 4 5
-
cummin(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative minimum of the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- Series
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.cummin() 0 1 1 1 2 1 3 1 4 1
-
cumprod(axis=0, skipna=True, *args, **kwargs)¶ Return cumulative product of the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- Series
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.cumprod() 0 1 1 5 2 10 3 40 4 120
-
cumsum(axis=0, skipna=True, *args, **kwargs)¶ Return cumulative sum of the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- Series
Notes
Parameters currently not supported is axis
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.cumsum() 0 1 1 6 2 8 3 12 4 15
-
property
data¶ The gpu buffer for the data
-
describe(percentiles=None, include=None, exclude=None)¶ Compute summary statistics of a Series. For numeric data, the output includes the minimum, maximum, mean, median, standard deviation, and various quantiles. For object data, the output includes the count, number of unique values, the most common value, and the number of occurrences of the most common value.
- Parameters
- percentileslist-like, optional
The percentiles used to generate the output summary statistics. If None, the default percentiles used are the 25th, 50th and 75th. Values should be within the interval [0, 1].
- Returns
- A DataFrame containing summary statistics of relevant columns from
- the input DataFrame.
Examples
Describing a
Seriescontaining numeric values.>>> import cudf >>> s = cudf.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> print(s.describe()) stats values 0 count 10.0 1 mean 5.5 2 std 3.02765 3 min 1.0 4 25% 2.5 5 50% 5.5 6 75% 7.5 7 max 10.0
-
diff(periods=1)¶ Calculate the difference between values at positions i and i - N in an array and store the output in a new array.
Notes
Diff currently only supports float and integer dtype columns with no null values.
-
digitize(bins, right=False)¶ Return the indices of the bins to which each value in series belongs.
- Parameters
- binsnp.array
1-D monotonically, increasing array with same type as this series.
- rightbool
Indicates whether interval contains the right or left bin edge.
- Returns
- A new Series containing the indices.
Notes
Monotonicity of bins is assumed and not checked.
-
drop_duplicates(keep='first', inplace=False, ignore_index=False)¶ Return Series with duplicate values removed
-
dropna(axis=0, inplace=False, how=None)¶ Return a Series with null values removed.
- Parameters
- axis{0 or ‘index’}, default 0
There is only one axis to drop values from.
- inplacebool, default False
If True, do operation inplace and return None.
- howstr, optional
Not in use. Kept for compatibility.
- Returns
- Series
Series with null entries dropped from it.
See also
Series.isnaIndicate null values.
Series.notnaIndicate non-null values.
Series.fillnaReplace null values.
cudf.core.dataframe.DataFrame.dropnaDrop rows or columns which contain null values.
cudf.core.index.Index.dropnaDrop null indices.
Examples
>>> import cudf >>> ser = cudf.Series([1, 2, None]) >>> ser 0 1 1 2 2 null dtype: int64
Drop null values from a Series.
>>> ser.dropna() 0 1 1 2 dtype: int64
Keep the Series with valid entries in the same variable.
>>> ser.dropna(inplace=True) >>> ser 0 1 1 2 dtype: int64
Empty strings are not considered null values. None is considered a null value.
>>> ser = cudf.Series(['', None, 'abc']) >>> ser 0 1 None 2 abc dtype: object >>> ser.dropna() 0 2 abc dtype: object
-
property
dt¶ Accessor object for datetimelike properties of the Series values.
- Returns
- A Series indexed like the original Series.
- Raises
- TypeError if the Series does not contain datetimelike values.
Examples
>>> s.dt.hour >>> s.dt.second >>> s.dt.day
-
property
dtype¶ dtype of the Series
-
property
empty¶ Indicator whether DataFrame or Series is empty.
True if DataFrame/Series is entirely empty (no items), meaning any of the axes are of length 0.
- Returns
- outbool
If DataFrame/Series is empty, return True, if not return False.
Notes
If DataFrame/Series contains only null values, it is still not considered empty. See the example below.
Examples
>>> import cudf >>> df = cudf.DataFrame({'A' : []}) >>> df Empty DataFrame Columns: [A] Index: [] >>> df.empty True
If we only have null values in our DataFrame, it is not considered empty! We will need to drop the null’s to make the DataFrame empty:
>>> df = cudf.DataFrame({'A' : [None, None]}) >>> df A 0 null 1 null >>> df.empty False >>> df.dropna().empty True
Non-empty and empty Series example:
>>> s = cudf.Series([1, 2, None]) >>> s 0 1 1 2 2 null dtype: int64 >>> s.empty False >>> s = cudf.Series([]) >>> s Series([], dtype: float64) >>> s.empty True
-
eq(other, fill_value=None, axis=0)¶ Equal to of series and other, element-wise (binary operator eq).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
equals(other)¶ Test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.
- Parameters
- otherSeries or DataFrame
The other Series or DataFrame to be compared with the first.
- Returns
- bool
True if all elements are the same in both objects, False otherwise.
Examples
>>> import cudf >>> s = cudf.Series([1, 2, 3]) >>> other = cudf.Series([1, 2, 3]) >>> s.equals(other) True >>> different = cudf.Series([1.5, 2, 3]) >>> s.equals(different) False
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
factorize(na_sentinel=- 1)¶ Encode the input values as integer labels
- Parameters
- na_sentinelnumber
Value to indicate missing category.
- Returns
- (labels, cats)(Series, Series)
labels contains the encoded values
cats contains the categories in order that the N-th item corresponds to the (N-1) code.
-
fillna(value, method=None, axis=None, inplace=False, limit=None)¶ Fill null values with
value.- Parameters
- valuescalar, Series-like or dict
Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns.
- Returns
- resultDataFrame
Copy with nulls filled.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]}) >>> df a b 0 1 3 1 2 null 2 null 5 >>> df.fillna(4) a b 0 1 3 1 2 4 2 4 5 >>> df.fillna({'a': 3, 'b': 4}) a b 0 1 3 1 2 4 2 3 5
fillnaon a Series object:>>> ser = cudf.Series(['a', 'b', None, 'c']) >>> ser 0 a 1 b 2 None 3 c dtype: object >>> ser.fillna('z') 0 a 1 b 2 z 3 c dtype: object
fillnacan also supports inplace operation:>>> ser.fillna('z', inplace=True) >>> ser 0 a 1 b 2 z 3 c dtype: object >>> df.fillna({'a': 3, 'b': 4}, inplace=True) >>> df a b 0 1 3 1 2 4 2 3 5
-
floor()¶ Rounds each value downward to the largest integral value not greater than the original.
Returns a new Series.
-
floordiv(other, fill_value=None, axis=0)¶ Integer division of series and other, element-wise (binary operator floordiv).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
classmethod
from_arrow(s)¶ Convert from a PyArrow Array.
- Parameters
- sPyArrow Object
PyArrow Object which has to be converted to cudf Series.
- Raises
- TypeError for invalid input type.
Examples
>>> import pyarrow as pa >>> import cudf >>> import pyarrow as pa >>> data = pa.array([1, 2, 3]) >>> data <pyarrow.lib.Int64Array object at 0x7f67007e07c0> [ 1, 2, 3 ] >>> cudf.Series.from_arrow(data) 0 1 1 2 2 3 dtype: int64
-
classmethod
from_categorical(categorical, codes=None)¶ Creates from a pandas.Categorical
If
codesis defined, use it instead ofcategorical.codes
-
classmethod
from_masked_array(data, mask, null_count=None)¶ Create a Series with null-mask. This is equivalent to:
Series(data).set_mask(mask, null_count=null_count)
- Parameters
- data1D array-like
The values. Null values must not be skipped. They can appear as garbage values.
- mask1D array-like
The null-mask. Valid values are marked as
1; otherwise0. The mask bit given the data indexidxis computed as:(mask[idx // 8] >> (idx % 8)) & 1
- null_countint, optional
The number of null values. If None, it is calculated automatically.
-
classmethod
from_pandas(s, nan_as_null=None)¶ Convert from a Pandas Series.
- Parameters
- sPandas Series object
A Pandas Series object which has to be converted to cuDF Series.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pds = pd.Series(data) >>> cudf.Series.from_pandas(pds) 0 10.0 1 20.0 2 30.0 3 null dtype: float64 >>> cudf.Series.from_pandas(pds, nan_as_null=False) 0 10.0 1 20.0 2 30.0 3 NaN dtype: float64
-
ge(other, fill_value=None, axis=0)¶ Greater than or equal to of series and other, element-wise (binary operator ge).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
groupby(by=None, group_series=None, level=None, sort=True, group_keys=True, as_index=None, dropna=True, method=None)¶ Group Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an cupy array is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
- Returns
- SeriesGroupBy
Returns a groupby object that contains information about the groups.
Examples
>>> ser = cudf.Series([390., 350., 30., 20.], ... index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... name="Max Speed") >>> ser Falcon 390.0 Falcon 350.0 Parrot 30.0 Parrot 20.0 Name: Max Speed, dtype: float64 >>> ser.groupby(level=0).mean() Falcon 370.0 Parrot 25.0 Name: Max Speed, dtype: float64 >>> ser.groupby(ser > 100).mean() Max Speed False 25.0 True 370.0 Name: Max Speed, dtype: float64
-
gt(other, fill_value=None, axis=0)¶ Greater than of series and other, element-wise (binary operator gt).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
property
has_nulls¶ Indicator whether Series contains null values.
- Returns
- outbool
If Series has atleast one null value, return True, if not return False.
-
hash_encode(stop, use_name=False)¶ Encode column values as ints in [0, stop) using hash function.
- Parameters
- stopint
The upper bound on the encoding range.
- use_namebool
If
Truethen combine hashed column values with hashed column name. This is useful for when the same values in different columns should be encoded with different hashed values.
- Returns
- resultSeries
The encoded Series.
-
hash_values()¶ Compute the hash of values in this column.
-
head(n=5)¶ Return the first n rows. This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. For negative values of n, this function returns all rows except the last n rows, equivalent to
df[:-n].- Parameters
- nint, default 5
Number of rows to select.
- Returns
- same type as caller
The first n rows of the caller object.
See also
Series.tailReturns the last n rows.
Examples
>>> ser = cudf.Series(['alligator', 'bee', 'falcon', ... 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']) >>> ser 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra dtype: object
Viewing the first 5 lines
>>> ser.head() 0 alligator 1 bee 2 falcon 3 lion 4 monkey dtype: object
Viewing the first n lines (three in this case)
>>> ser.head(3) 0 alligator 1 bee 2 falcon dtype: object
For negative values of n
>>> ser.head(-3) 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot dtype: object
-
property
iloc¶ Select values by position.
See also
-
property
index¶ The index object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Return boolean if values in the object are monotonic_increasing.
- Returns
- outbool
-
property
is_monotonic_decreasing¶ Return boolean if values in the object are monotonic_decreasing.
- Returns
- outbool
-
property
is_monotonic_increasing¶ Return boolean if values in the object are monotonic_increasing.
- Returns
- outbool
-
property
is_unique¶ Return boolean if values in the object are unique.
- Returns
- outbool
-
isin(values)¶ Check whether values are contained in Series.
- Parameters
- valuesset or list-like
The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.
- Returns
- resultSeries
Series of booleans indicating if each element is in values.
- Raises
- TypeError
If values is a string
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
keys()¶ Return alias for index.
- Returns
- Index
Index of the Series.
Examples
>>> import cudf >>> sr = cudf.Series([10, 11, 12, 13, 14, 15]) >>> sr 0 10 1 11 2 12 3 13 4 14 5 15 dtype: int64
>>> sr.keys() RangeIndex(start=0, stop=6) >>> sr = cudf.Series(['a', 'b', 'c']) >>> sr 0 a 1 b 2 c dtype: object >>> sr.keys() RangeIndex(start=0, stop=3) >>> sr = cudf.Series([1, 2, 3], index=['a', 'b', 'c']) >>> sr a 1 b 2 c 3 dtype: int64 >>> sr.keys() StringIndex(['a' 'b' 'c'], dtype='object')
-
kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return Fisher’s unbiased kurtosis of a sample.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
-
kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return Fisher’s unbiased kurtosis of a sample.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
-
label_encoding(cats, dtype=None, na_sentinel=- 1)¶ Perform label encoding
- Parameters
- valuessequence of input values
- dtypenumpy.dtype; optional
Specifies the output dtype. If None is given, the smallest possible integer dtype (starting with np.int8) is used.
- na_sentinelnumber, default -1
Value to indicate missing category.
- Returns
- A sequence of encoded labels with value between 0 and n-1 classes(cats)
Examples
>>> import cudf >>> s = cudf.Series([1, 2, 3, 4, 10]) >>> s.label_encoding([2, 3]) 0 -1 1 0 2 1 3 -1 4 -1 dtype: int8
na_sentinel parameter can be used to control the value when there is no encoding.
>>> s.label_encoding([2, 3], na_sentinel=10) 0 10 1 0 2 1 3 10 4 10 dtype: int8
When none of cats values exist in s, entire Series will be na_sentinel.
>>> s.label_encoding(['a', 'b', 'c']) 0 -1 1 -1 2 -1 3 -1 4 -1 dtype: int8
-
le(other, fill_value=None, axis=0)¶ Less than or equal to of series and other, element-wise (binary operator le).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
property
loc¶ Select values by label.
See also
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
lt(other, fill_value=None, axis=0)¶ Less than of series and other, element-wise (binary operator lt).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)¶ Return the maximum of the values in the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- dtypedata type
Data type to cast the result to.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.max() 5
-
mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values in the series.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
Examples
>>> import cudf >>> ser = cudf.Series([10, 25, 3, 25, 24, 6]) >>> ser.mean() 15.5
-
median(skipna=True)¶ Compute the median of the series
-
memory_usage(index=True, deep=False)¶ Return the memory usage of the Series.
The memory usage can optionally include the contribution of the index and of elements of object dtype.
- Parameters
- indexbool, default True
Specifies whether to include the memory usage of the Series index.
- deepbool, default False
If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned value.
- Returns
- int
Bytes of memory consumed.
See also
cudf.core.dataframe.DataFrame.memory_usageBytes consumed by a DataFrame.
Examples
>>> s = cudf.Series(range(3), index=['a','b','c']) >>> s.memory_usage() 48
Not including the index gives the size of the rest of the data, which is necessarily smaller:
>>> s.memory_usage(index=False) 24
-
min(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, **kwargs)¶ Return the minimum of the values in the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- dtypedata type
Data type to cast the result to.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.min() 1
-
mod(other, fill_value=None, axis=0)¶ Modulo of series and other, element-wise (binary operator mod).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
mul(other, fill_value=None, axis=0)¶ Multiplication of series and other, element-wise (binary operator mul).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
property
name¶ Returns name of the Series.
-
nans_to_nulls()¶ Convert nans (if any) to nulls
-
property
ndim¶ Dimension of the data. Series ndim is always 1.
-
ne(other, fill_value=None, axis=0)¶ Not equal to of series and other, element-wise (binary operator ne).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
nlargest(n=5, keep='first')¶ Returns a new Series of the n largest element.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
nsmallest(n=5, keep='first')¶ Returns a new Series of the n smallest element.
-
property
null_count¶ Number of null values
-
property
nullable¶ A boolean indicating whether a null-mask is needed
-
property
nullmask¶ The gpu buffer for the null-mask
-
nunique(method='sort', dropna=True)¶ Returns the number of unique values of the Series: approximate version, and exact version to be moved to libgdf
-
one_hot_encoding(cats, dtype='float64')¶ Perform one-hot-encoding
- Parameters
- catssequence of values
values representing each category.
- dtypenumpy.dtype
specifies the output dtype.
- Returns
- Sequence
A sequence of new series for each category. Its length is determined by the length of
cats.
-
pow(other, fill_value=None, axis=0)¶ Exponential power of series and other, element-wise (binary operator pow).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
prod(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return product of the values in the series
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- dtypedata type
Data type to cast the result to.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.prod() 120
-
product(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return product of the values in the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- dtypedata type
Data type to cast the result to.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.product() 120
-
quantile(q=0.5, interpolation='linear', exact=True, quant_index=True)¶ Return values at the given quantile.
- Parameters
- qfloat or array-like, default 0.5 (50% quantile)
0 <= q <= 1, the quantile(s) to compute
- interpolation{’linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
- columnslist of str
List of column names to include.
- exactboolean
Whether to use approximate or exact quantile algorithm.
- quant_indexboolean
Whether to use the list of quantiles as index.
- Returns
- DataFrame
-
radd(other, fill_value=None, axis=0)¶ Addition of series and other, element-wise (binary operator radd).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
reindex(index=None, copy=True)¶ Return a Series that conforms to a new index
- Parameters
- indexIndex, Series-convertible, default None
- copyboolean, default True
- Returns
- A new Series that conforms to the supplied index
-
rename(index=None, copy=True)¶ Alter Series name
Change Series.name with a scalar value
- Parameters
- indexScalar, optional
Scalar to alter the Series.name attribute
- copyboolean, default True
Also copy underlying data
- Returns
- Series
Notes
- Difference from pandas:
Supports scalar values only for changing name attribute
Not supporting : inplace, level
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method=None)¶ Replace values given in
to_replacewithvalue.- Parameters
- to_replacenumeric, str or list-like
Value(s) to replace.
- numeric or str:
values equal to
to_replacewill be replaced withvalue
- list of numeric or str:
If
valueis also list-like,to_replaceandvaluemust be of same length.
- valuenumeric, str, list-like, or dict
Value(s) to replace
to_replacewith.- inplacebool, default False
If True, in place.
- Returns
- resultSeries
Series after replacement. The mask and index are preserved.
See also
Notes
Parameters that are currently not supported are: limit, regex, method
-
reset_index(drop=False, inplace=False)¶ Reset index to RangeIndex
-
reverse()¶ Reverse the Series
-
rfloordiv(other, fill_value=None, axis=0)¶ Integer division of series and other, element-wise (binary operator rfloordiv).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
- Returns
- Series
Result of the arithmetic operation.
Examples
>>> import cudf >>> s = cudf.Series([1, 2, 10, 17]) >>> s 0 1 1 2 2 10 3 17 dtype: int64 >>> s.rfloordiv(100) 0 100 1 50 2 10 3 5 dtype: int64 >>> s = cudf.Series([10, 20, None]) >>> s 0 10 1 20 2 null dtype: int64 >>> s.rfloordiv(200) 0 20 1 10 2 null dtype: int64 >>> s.rfloordiv(200, fill_value=2) 0 20 1 10 2 100 dtype: int64
-
rmod(other, fill_value=None, axis=0)¶ Modulo of series and other, element-wise (binary operator rmod).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
rmul(other, fill_value=None, axis=0)¶ Multiplication of series and other, element-wise (binary operator rmul).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
rolling(window, min_periods=None, center=False, axis=0, win_type=None)¶ Rolling window calculations.
- Parameters
- windowint or offset
Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset.
- min_periodsint, optional
The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or
None,min_periodsis equal to the window size.- centerbool, optional
If
True, the result is set at the center of the window. IfFalse(default), the result is set at the right edge of the window.
- Returns
Rollingobject.
Examples
>>> import cudf >>> a = cudf.Series([1, 2, 3, None, 4])
Rolling sum with window size 2.
>>> print(a.rolling(2).sum()) 0 1 3 2 5 3 4 dtype: int64
Rolling sum with window size 2 and min_periods 1.
>>> print(a.rolling(2, min_periods=1).sum()) 0 1 1 3 2 5 3 3 4 4 dtype: int64
Rolling count with window size 3.
>>> print(a.rolling(3).count()) 0 1 1 2 2 3 3 2 4 2 dtype: int64
Rolling count with window size 3, but with the result set at the center of the window.
>>> print(a.rolling(3, center=True).count()) 0 2 1 3 2 2 3 2 4 1 dtype: int64
Rolling max with variable window size specified by an offset; only valid for datetime index.
>>> a = cudf.Series( ... [1, 9, 5, 4, np.nan, 1], ... index=[ ... pd.Timestamp('20190101 09:00:00'), ... pd.Timestamp('20190101 09:00:01'), ... pd.Timestamp('20190101 09:00:02'), ... pd.Timestamp('20190101 09:00:04'), ... pd.Timestamp('20190101 09:00:07'), ... pd.Timestamp('20190101 09:00:08') ... ] ... )
>>> print(a.rolling('2s').max()) 2019-01-01T09:00:00.000 1 2019-01-01T09:00:01.000 9 2019-01-01T09:00:02.000 9 2019-01-01T09:00:04.000 4 2019-01-01T09:00:07.000 2019-01-01T09:00:08.000 1 dtype: int64
Apply custom function on the window with the apply method
>>> import numpy as np >>> import math >>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64) >>> def some_func(A): ... b = 0 ... for a in A: ... b = b + math.sqrt(a) ... return b ... >>> print(b.rolling(3, min_periods=1).apply(some_func)) 0 4.0 1 9.0 2 15.0 3 18.0 4 21.0 5 24.0 dtype: float64
And this also works for window rolling set by an offset
>>> import pandas as pd >>> c = cudf.Series( ... [16, 25, 36, 49, 64, 81], ... index=[ ... pd.Timestamp('20190101 09:00:00'), ... pd.Timestamp('20190101 09:00:01'), ... pd.Timestamp('20190101 09:00:02'), ... pd.Timestamp('20190101 09:00:04'), ... pd.Timestamp('20190101 09:00:07'), ... pd.Timestamp('20190101 09:00:08') ... ], ... dtype=np.float64 ... ) >>> print(c.rolling('2s').apply(some_func)) 2019-01-01T09:00:00.000 4.0 2019-01-01T09:00:01.000 9.0 2019-01-01T09:00:02.000 11.0 2019-01-01T09:00:04.000 7.0 2019-01-01T09:00:07.000 8.0 2019-01-01T09:00:08.000 17.0 dtype: float64
-
round(decimals=0)¶ Round a Series to a configurable number of decimal places.
-
rpow(other, fill_value=None, axis=0)¶ Exponential power of series and other, element-wise (binary operator rpow).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
rsub(other, fill_value=None, axis=0)¶ Subtraction of series and other, element-wise (binary operator rsub).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
rtruediv(other, fill_value=None, axis=0)¶ Floating division of series and other, element-wise (binary operator rtruediv).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scale()¶ Scale values to [0, 1] in float64
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
set_index(index)¶ Returns a new Series with a different index.
- Parameters
- indexIndex, Series-convertible
the new index or values for the new index
-
set_mask(mask, null_count=None)¶ Create new Series by setting a mask array.
This will override the existing mask. The returned Series will reference the same data buffer as this Series.
- Parameters
- mask1D array-like
The null-mask. Valid values are marked as
1; otherwise0. The mask bit given the data indexidxis computed as:(mask[idx // 8] >> (idx % 8)) & 1
- null_countint, optional
The number of null values. If None, it is calculated automatically.
-
property
shape¶ Returns a tuple representing the dimensionality of the Series.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased Fisher-Pearson skew of a sample.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
-
sort_index(ascending=True)¶ Sort by the index.
-
sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)¶ Sort by the values.
Sort a Series in ascending or descending order by some criterion.
- Parameters
- ascendingbool, default True
If True, sort values in ascending order, otherwise descending.
- na_position{‘first’, ‘last’}, default ‘last’
‘first’ puts nulls at the beginning, ‘last’ puts nulls at the end.
- ignore_indexbool, default False
If True, index will not be sorted.
- Returns
- sorted_objcuDF Series
Notes
- Difference from pandas:
Not supporting: inplace, kind
Examples
>>> import cudf >>> s = cudf.Series([1, 5, 2, 4, 3]) >>> s.sort_values() 0 1 2 2 4 3 3 4 1 5
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return sample standard deviation of the Series.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
-
property
str¶ Vectorized string functions for Series and Index.
This mimics pandas
df.strinterface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.
-
sub(other, fill_value=None, axis=0)¶ Subtraction of series and other, element-wise (binary operator sub).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
sum(axis=None, skipna=None, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return sum of the values in the Series.
- Parameters
- skipnabool, default True
Exclude NA/null values when computing the result.
- dtypedata type
Data type to cast the result to.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level, numeric_only.
Examples
>>> import cudf >>> ser = cudf.Series([1, 5, 2, 4, 3]) >>> ser.sum() 15
-
tail(n=5)¶ Returns the last n rows as a new Series
Examples
>>> import cudf >>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> print(ser.tail(2)) 3 1 4 0
-
take(indices, keep_index=True)¶ Return Series by taking values from the corresponding indices.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
If
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Series to a PyArrow Array.
Examples
>>> import cudf >>> ser = cudf.Series([-3, 10, 15, 20]) >>> ser.to_arrow() <pyarrow.lib.Int64Array object at 0x7f5e769499f0> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(name=None)¶ Convert Series into a DataFrame
- Parameters
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_gpu_array(fillna=None)¶ Get a dense numba device array for the data.
- Parameters
- fillnastr or None
See fillna in
.to_array.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_hdf(path_or_buf, key, *args, **kwargs)¶ Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.
For more information see the user guide.
- Parameters
- path_or_bufstr or pandas.HDFStore
File path or HDFStore object.
- keystr
Identifier for the group in the store.
- mode{‘a’, ‘w’, ‘r+’}, default ‘a’
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- format{‘fixed’, ‘table’}, default ‘fixed’
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
- appendbool, default False
For Table formats, append the input data to the existing.
- data_columnslist of columns or True, optional
List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.
- complevel{0-9}, optional
Specifies a compression level for data. A value of 0 disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- fletcher32bool, default False
If applying compression use the fletcher32 checksum.
- dropnabool, default False
If true, ALL nan rows will not be written to store.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()for a full list of options.
See also
cudf.io.hdf.read_hdfRead from HDF file.
cudf.io.parquet.to_parquetWrite a DataFrame to the binary parquet format.
cudf.io.feather.to_featherWrite out feather-format for DataFrames.
-
to_json(path_or_buf=None, *args, **kwargs)¶ Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.
- Parameters
- path_or_bufstring or file handle, optional
File path or object. If not specified, the result is returned as a string.
- orientstring
Indication of expected JSON string format.
- Series
default is ‘index’
allowed values are: {‘split’,’records’,’index’,’table’}
- DataFrame
default is ‘columns’
allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}
- The format of the JSON string
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
‘columns’ : dict like {column -> {index -> value}}
‘values’ : just the values array
‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like
orient='records'.
- date_format{None, ‘epoch’, ‘iso’}
Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For
orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.- double_precisionint, default 10
The number of decimal places to use when encoding floating point values.
- force_asciibool, default True
Force encoded string to be ASCII.
- date_unitstring, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
- default_handlercallable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.
- linesbool, default False
If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.
- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.
- indexbool, default True
Whether to include the index values in the JSON string. Not including the index (
index=False) is only supported when orient is ‘split’ or ‘table’.
See also
-
to_pandas(index=True, **kwargs)¶ Convert to a Pandas Series.
- Parameters
- indexBoolean, Default True
If
indexisTrue, converts the index of cudf.Series and sets it to the pandas.Series. IfindexisFalse, no index conversion is performed and pandas.Series will assign a default index.
Examples
>>> import cudf >>> ser = cudf.Series([-3, 2, 0]) >>> pds = ser.to_pandas() >>> pds 0 -3 1 2 2 0 dtype: int64 >>> type(pds) <class 'pandas.core.series.Series'>
-
to_string()¶ Convert to string
Uses Pandas formatting internals to produce output identical to Pandas. Use the Pandas formatting settings directly in Pandas to control cuDF output.
-
truediv(other, fill_value=None, axis=0)¶ Floating division of series and other, element-wise (binary operator truediv).
- Parameters
- otherSeries or scalar value
- fill_valueNone or value
Value to fill nulls with before computation. If data in both corresponding Series locations is null the result will be null
-
unique()¶ Returns unique values of this Series.
-
property
valid_count¶ Number of non-null values
-
value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)¶ Return a Series containing counts of unique values.
The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
- Parameters
- normalizebool, default False
If True then the object returned will contain the relative frequencies of the unique values. normalize == True is not supported.
- sortbool, default True
Sort by frequencies.
- ascendingbool, default False
Sort in ascending order.
- binsint, optional
Rather than count values, group them into half-open bins, works with numeric data. Not yet supported.
- dropnabool, default True
Don’t include counts of NaN and None. dropna == False is not supported
- Returns
- resultSeries contanining counts of unique values.
Examples
>>> import cudf >>> sr = cudf.Series([1.0, 2.0, 2.0, 3.0, 3.0, 3.0, None]) >>> sr.value_counts(ascending=True) 1.0 1 2.0 2 3.0 3 dtype: int32
-
property
values¶ Return a CuPy representation of the Series.
Only the values in the Series will be returned.
- Returns
- outcupy.ndarray
The values of the Series.
Examples
>>> import cudf >>> ser = cudf.Series([1, -10, 100, 20]) >>> ser.values array([ 1, -10, 100, 20]) >>> type(ser.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Series.
Only the values in the Series will be returned.
- Returns
- outnumpy.ndarray
The values of the Series.
Examples
>>> import cudf >>> ser = cudf.Series([1, -10, 100, 20]) >>> ser.values_host array([ 1, -10, 100, 20]) >>> type(ser.values_host) <class 'numpy.ndarray'>
-
values_to_string(nrows=None)¶ Returns a list of string for each element.
-
var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return unbiased variance of the Series.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- Returns
- scalar
Notes
Parameters currently not supported are axis, level and numeric_only
-
where(cond, other=None, inplace=False)¶ Replace values where the condition is False.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.where(df % 2 == 0, [-1, -1]) A B 0 -1 -1 1 4 -1 2 -1 8
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.where(ser > 2, 10) 0 4 1 3 2 10 3 10 4 10 dtype: int64 >>> ser.where(ser > 2) 0 4 1 3 2 null 3 null 4 null dtype: int64
Strings¶
-
class
cudf.core.column.string.StringMethods(column, parent=None)¶ Vectorized string functions for Series and Index.
This mimics pandas
df.strinterface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.Methods
byte_count(**kwargs)Computes the number of bytes of each string in the Series/Index.
capitalize(**kwargs)Convert strings in the Series/Index to be capitalized.
cat([others, sep, na_rep])Concatenate strings in the Series/Index with given separator.
center(width[, fillchar])Filling left and right side of strings in the Series/Index with an additional character.
character_ngrams([n])Generate the n-grams from characters in a column of strings.
character_tokenize(**kwargs)Each string is split into individual characters.
code_points(**kwargs)Returns an array by filling it with the UTF-8 code point values for each character of each string.
contains(pat[, case, flags, na, regex])Test if pattern or regex is contained within a string of a Series or Index.
count(pat[, flags])Count occurrences of pattern in each string of the Series/Index.
detokenize(indices[, separator])Combines tokens into strings by concatenating them in the order in which they appear in the
indicescolumn.endswith(pat, **kwargs)Test if the end of each string element matches a pattern.
extract(pat[, flags, expand])Extract capture groups in the regex pat as columns in a DataFrame.
filter_alphanum([repl, keep])Remove non-alphanumeric characters from strings in this column.
filter_tokens(min_token_length[, …])Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string.
find(sub[, start, end])Return lowest indexes in each strings in the Series/Index where the substring is fully contained between
[start:end].findall(pat[, flags])Find all occurrences of pattern or regular expression in the Series/Index.
get([i])Extract element from each component at specified position.
htoi()Returns integer value represented by each hex string.
index(sub[, start, end])Return lowest indexes in each strings where the substring is fully contained between
[start:end].insert([start, repl])Insert the specified string into each string in the specified position.
ip2int()This converts ip strings to integers
is_consonant(position, **kwargs)Return true for strings where the character at
positionis a consonant.is_vowel(position, **kwargs)Return true for strings where the character at
positionis a vowel – not a consonant.isalnum(**kwargs)Check whether all characters in each string are alphanumeric.
isalpha(**kwargs)Check whether all characters in each string are alphabetic.
isdecimal(**kwargs)Check whether all characters in each string are decimal.
isdigit(**kwargs)Check whether all characters in each string are digits.
isempty(**kwargs)Check whether each string is an empty string.
isfloat(**kwargs)Check whether all characters in each string form floating value.
ishex(**kwargs)Check whether all characters in each string form a hex integer.
isinteger(**kwargs)Check whether all characters in each string form integer.
isipv4(**kwargs)Check whether all characters in each string form an IPv4 address.
islower(**kwargs)Check whether all characters in each string are lowercase.
isnumeric(**kwargs)Check whether all characters in each string are numeric.
isspace(**kwargs)Check whether all characters in each string are whitespace.
isupper(**kwargs)Check whether all characters in each string are uppercase.
join(sep)Join lists contained as elements in the Series/Index with passed delimiter.
len(**kwargs)Computes the length of each element in the Series/Index.
ljust(width[, fillchar])Filling right side of strings in the Series/Index with an additional character.
lower(**kwargs)Converts all characters to lowercase.
lstrip([to_strip])Remove leading and trailing characters.
match(pat[, case, flags])Determine if each string matches a regular expression.
ngrams([n, separator])Generate the n-grams from a set of tokens, each record in series is treated a token.
ngrams_tokenize([n, delimiter, separator])Generate the n-grams using tokens from each string.
normalize_characters([do_lower])Normalizes strings characters for tokenizing.
normalize_spaces(**kwargs)Remove extra whitespace between tokens and trim whitespace from the beginning and the end of each string.
pad(width[, side, fillchar])Pad strings in the Series/Index up to width.
partition([sep, expand])Split the string at the first occurrence of sep.
porter_stemmer_measure(**kwargs)Compute the Porter Stemmer measure for each string.
replace(pat, repl[, n, case, flags, regex])Replace occurrences of pattern/regex in the Series/Index with some other string.
replace_tokens(targets, replacements[, …])The targets tokens are searched for within each string in the series and replaced with the corresponding replacements if found.
replace_with_backrefs(pat, repl, **kwargs)Use the
replback-ref template to create a new string with the extracted elements found using thepatexpression.rfind(sub[, start, end])Return highest indexes in each strings in the Series/Index where the substring is fully contained between
[start:end].rindex(sub[, start, end])Return highest indexes in each strings where the substring is fully contained between
[start:end].rjust(width[, fillchar])Filling left side of strings in the Series/Index with an additional character.
rpartition([sep, expand])Split the string at the last occurrence of sep.
rsplit([pat, n, expand])Split strings around given separator/delimiter.
rstrip([to_strip])Remove leading and trailing characters.
slice([start, stop, step])Slice substrings from each element in the Series or Index.
slice_from(starts, stops, **kwargs)Return substring of each string using positions for each string.
slice_replace([start, stop, repl])Replace the specified section of each string with a new string.
split([pat, n, expand])Split strings around given separator/delimiter.
startswith(pat, **kwargs)Test if the start of each string element matches a pattern.
strip([to_strip])Remove leading and trailing characters.
subword_tokenize(hash_file[, max_length, …])Run CUDA BERT subword tokenizer on cuDF strings column.
swapcase(**kwargs)Change each lowercase character to uppercase and vice versa.
title(**kwargs)Uppercase the first letter of each letter after a space and lowercase the rest.
token_count([delimiter])Each string is split into tokens using the provided delimiter.
tokenize([delimiter])Each string is split into tokens using the provided delimiter(s).
translate(table, **kwargs)Map all characters in the string through the given mapping table.
upper(**kwargs)Convert each string to uppercase.
url_decode(**kwargs)Returns a URL-decoded format of each string.
url_encode(**kwargs)Returns a URL-encoded format of each string.
wrap(width, **kwargs)Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.
zfill(width, **kwargs)Pad strings in the Series/Index by prepending ‘0’ characters.
-
byte_count(**kwargs)¶ Computes the number of bytes of each string in the Series/Index.
- ReturnsSeries or Index of int
A Series or Index of integer values indicating the number of bytes of each strings in the Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(["abc","d","ef"]) >>> s.str.byte_count() 0 3 1 1 2 2 dtype: int32 >>> s = cudf.Series(["Hello", "Bye", "Thanks 😊"]) >>> s.str.byte_count() 0 5 1 3 2 11 dtype: int32
-
capitalize(**kwargs)¶ Convert strings in the Series/Index to be capitalized. This only applies to ASCII characters at this time.
- Returns
- Series or Index of object
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'] >>> s = cudf.Series(data) >>> s.str.capitalize() 0 Lower 1 Capitals 2 This is a sentence 3 Swapcase dtype: object >>> s = cudf.Series(["hello, friend","goodbye, friend"]) >>> s.str.capitalize() 0 Hello, friend 1 Goodbye, friend dtype: object
-
cat(others=None, sep=None, na_rep=None, **kwargs)¶ Concatenate strings in the Series/Index with given separator.
If
othersis specified, this function concatenates the Series/Index and elements of others element-wise. If others is not passed, then all values in the Series/Index are concatenated into a single string with a given sep.- Parameters
- othersSeries or List of str
Strings to be appended. The number of strings must match
size()of this instance. This must be either a Series of string dtype or a Python list of strings.- sepstr
If specified, this separator will be appended to each string before appending the others.
- na_repstr
This character will take the place of any null strings (not empty strings) in either list.
If
na_repisNone, andothersisNone, missing values in the Series/Index are omitted from the result.If
na_repisNone, andothersis notNone, a row containing a missing value in any of the columns (before concatenation) will have a missing value in the result.
- Returns
- concatstr or Series/Index of str dtype
If
othersisNone,stris returned, otherwise aSeries/Index(same type as caller) of str dtype is returned.
Examples
>>> import cudf >>> s = cudf.Series(['a', 'b', None, 'd']) >>> s.str.cat(sep=' ') 'a b d'
By default, NA values in the Series are ignored. Using na_rep, they can be given a representation:
>>> s.str.cat(sep=' ', na_rep='?') 'a b ? d'
If others is specified, corresponding values are concatenated with the separator. Result will be a Series of strings.
>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',') 0 a,A 1 b,B 2 None 3 d,D dtype: object
Missing values will remain missing in the result, but can again be represented using na_rep
>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-') 0 a,A 1 b,B 2 -,C 3 d,D dtype: object
If sep is not specified, the values are concatenated without separation.
>>> s.str.cat(['A', 'B', 'C', 'D'], na_rep='-') 0 aA 1 bB 2 -C 3 dD dtype: object
-
center(width, fillchar=' ', **kwargs)¶ Filling left and right side of strings in the Series/Index with an additional character.
- Parameters
- widthint
Minimum width of resulting string; additional characters will be filled with fillchar.
- fillcharstr, default is ‘ ‘ (whitespace)
Additional character for filling.
- Returns
- Series/Index of str dtype
Returns Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(['a', 'b', None, 'd']) >>> s.str.center(1) 0 a 1 b 2 None 3 d dtype: object >>> s.str.center(1, fillchar='-') 0 a 1 b 2 None 3 d dtype: object >>> s.str.center(2, fillchar='-') 0 a- 1 b- 2 None 3 d- dtype: object >>> s.str.center(5, fillchar='-') 0 --a-- 1 --b-- 2 None 3 --d-- dtype: object >>> s.str.center(6, fillchar='-') 0 --a--- 1 --b--- 2 None 3 --d--- dtype: object
-
character_ngrams(n=2, **kwargs)¶ Generate the n-grams from characters in a column of strings.
- Parameters
- nint
The degree of the n-gram (number of consecutive characters). Default of 2 for bigrams.
Examples
>>> import cudf >>> str_series = cudf.Series(['abcd','efgh','xyz']) >>> str_series.str.character_ngrams(2) 0 ab 1 bc 2 cd 3 ef 4 fg 5 gh 6 xy 7 yz dtype: object >>> str_series.str.character_ngrams(3) 0 abc 1 bcd 2 efg 3 fgh 4 xyz dtype: object
-
character_tokenize(**kwargs)¶ Each string is split into individual characters. The sequence returned contains each character as an individual string.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> data = ["hello world", None, "goodbye, thank you."] >>> ser = cudf.Series(data) >>> ser.str.character_tokenize() 0 h 1 e 2 l 3 l 4 o 5 6 w 7 o 8 r 9 l 10 d 11 g 12 o 13 o 14 d 15 b 16 y 17 e 18 , 19 20 t 21 h 22 a 23 n 24 k 25 26 y 27 o 28 u 29 . dtype: object
-
code_points(**kwargs)¶ Returns an array by filling it with the UTF-8 code point values for each character of each string. This function uses the
len()method to determine the size of each sub-array of integers.- Returns
- Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(["a","xyz", "éee"]) >>> s.str.code_points() 0 97 1 120 2 121 3 122 4 50089 5 101 6 101 dtype: int32 >>> s = cudf.Series(["abc"]) >>> s.str.code_points() 0 97 1 98 2 99 dtype: int32
-
contains(pat, case=True, flags=0, na=nan, regex=True, **kwargs)¶ Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
- Parameters
- patstr
Character sequence or regular expression.
- regexbool, default True
If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.
- Returns
- Series/Index of bool dtype
A Series/Index of boolean dtype indicating whether the given pattern is contained within the string of each element of the Series/Index.
Notes
The parameters case, flags, and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Examples
>>> import cudf >>> s1 = cudf.Series(['Mouse', 'dog', 'house and parrot', '23', None]) >>> s1 0 Mouse 1 dog 2 house and parrot 3 23 4 None dtype: object >>> s1.str.contains('og', regex=False) 0 False 1 True 2 False 3 False 4 null dtype: bool
Returning an Index of booleans using only a literal pattern.
>>> data = ['Mouse', 'dog', 'house and parrot', '23.0', np.NaN] >>> ind = cudf.core.index.StringIndex(data) >>> ind.str.contains('23', regex=False) Index(['False', 'False', 'False', 'True', 'null'], dtype='object')
Returning ‘house’ or ‘dog’ when either expression occurs in a string.
>>> s1.str.contains('house|dog', regex=True) 0 False 1 True 2 True 3 False 4 null dtype: bool
Returning any digit using regular expression.
>>> s1.str.contains('\d', regex=True) # noqa W605 0 False 1 False 2 False 3 True 4 null dtype: bool
Ensure
patis a not a literal pattern whenregexis set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35']) >>> s2.str.contains('.0', regex=True) 0 True 1 True 2 False 3 True 4 False dtype: bool
-
count(pat, flags=0, **kwargs)¶ Count occurrences of pattern in each string of the Series/Index.
This function is used to count the number of times a particular regex pattern is repeated in each of the string elements of the Series.
- Parameters
- patstr
Valid regular expression.
- Returns
- Series or Index
Notes
flags parameter is currently not supported.
Some characters need to be escaped when passing in pat. eg.
'$'has a special meaning in regex and must be escaped when finding this literal character.
Examples
>>> import cudf >>> s = cudf.Series(['A', 'B', 'Aaba', 'Baca', None, 'CABA', 'cat']) >>> s.str.count('a') 0 0 1 0 2 2 3 2 4 null 5 0 6 1 dtype: int32
Escape
'$'to find the literal dollar sign.>>> s = cudf.Series(['$', 'B', 'Aab$', '$$ca', 'C$B$', 'cat']) >>> s.str.count('\$') # noqa W605 0 1 1 0 2 1 3 2 4 2 5 0 dtype: int32
This is also available on Index.
>>> index = cudf.core.index.StringIndex(['A', 'A', 'Aaba', 'cat']) >>> index.str.count('a') Int64Index([0, 0, 2, 1], dtype='int64')
-
detokenize(indices, separator=' ', **kwargs)¶ Combines tokens into strings by concatenating them in the order in which they appear in the
indicescolumn. Theseparatoris concatenated between each token.- Parameters
- indiceslist of ints
Each value identifies the output row for the corresponding token.
- separatorstr
The string concatenated between each token in an output row. Default is space.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> strs = cudf.Series(["hello", "world", "one", "two", "three"]) >>> indices = cudf.Series([0, 0, 1, 1, 2]) >>> strs.str.detokenize(indices) 0 hello world 1 one two 2 three dtype: object
-
endswith(pat, **kwargs)¶ Test if the end of each string element matches a pattern.
- Parameters
- patstr or list-like
If str is an str, evaluates whether each string of series ends with pat. If pat is a list-like, evaluates whether self[i] ends with pat[i]. Regular expressions are not accepted.
- Returns
- Series or Index of bool
A Series of booleans indicating whether the given pattern matches the end of each string element.
Notes
na parameter is not yet supported, as cudf uses native strings instead of Python objects.
Examples
>>> import cudf >>> s = cudf.Series(['bat', 'bear', 'caT', None]) >>> s 0 bat 1 bear 2 caT 3 None dtype: object >>> s.str.endswith('t') 0 True 1 False 2 False 3 null dtype: bool
-
extract(pat, flags=0, expand=True, **kwargs)¶ Extract capture groups in the regex pat as columns in a DataFrame.
For each subject string in the Series, extract groups from the first match of regular expression pat.
- Parameters
- patstr
Regular expression pattern with capturing groups.
- expandbool, default True
If True, return DataFrame with on column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.
- Returns
- DataFrame or Series/Index
A DataFrame with one row for each subject string, and one column for each group. If expand=False and pat has only one capture group, then return a Series/Index.
Notes
The flags parameter is not yet supported and will raise a NotImplementedError if anything other than the default value is passed.
Examples
>>> import cudf >>> s = cudf.Series(['a1', 'b2', 'c3']) >>> s.str.extract(r'([ab])(\d)') # noqa W605 0 1 0 a 1 1 b 2 2 None None
A pattern with one group will return a DataFrame with one column if expand=True.
>>> s.str.extract(r'[ab](\d)', expand=True) # noqa W605 0 0 1 1 2 2 None
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False) # noqa W605 0 1 1 2 2 None dtype: object
-
filter_alphanum(repl=None, keep=True, **kwargs)¶ Remove non-alphanumeric characters from strings in this column.
- Parameters
- replstr
Optional string to use in place of removed characters.
- keepbool
Set to False to remove all alphanumeric characters instead of keeping them.
- Returns
- Series/Index of str dtype
Strings with only alphanumeric characters.
Examples
>>> import cudf >>> s = cudf.Series(["pears £12", "plums $34", "Temp 72℉", "100K℧"]) >>> s.str.filter_alphanum(" ") 0 pears 12 1 plums 34 2 Temp 72 3 100K dtype: object
-
filter_tokens(min_token_length, replacement=None, delimiter=None, **kwargs)¶ Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string. Tokens are identified by the delimiter character provided.
- Parameters
- min_token_length: int
Minimum number of characters for a token to be retained in the output string.
- replacementstr
String used in place of removed tokens.
- delimiterstr
The character(s) used to locate the tokens of each string. Default is whitespace.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> sr = cudf.Series(["this is me", "theme music", ""]) >>> sr.str.filter_tokens(3, replacement="_") 0 this _ _ 1 theme music 2 dtype: object >>> sr = cudf.Series(["this;is;me", "theme;music", ""]) >>> sr.str.filter_tokens(5,None,";") 0 ;; 1 theme;music 2 dtype: object
-
find(sub, start=0, end=None, **kwargs)¶ Return lowest indexes in each strings in the Series/Index where the substring is fully contained between
[start:end]. Return -1 on failure.- Parameters
- substr
Substring being searched.
- startint
Left edge index.
- endint
Right edge index.
- Returns
- Series or Index of int
Examples
>>> import cudf >>> s = cudf.Series(['abc', 'a','b' ,'ddb']) >>> s.str.find('b') 0 1 1 -1 2 0 3 2 dtype: int32
Parameters such as start and end can also be used.
>>> s.str.find('b', start=1, end=5) 0 1 1 -1 2 -1 3 2 dtype: int32
-
findall(pat, flags=0, **kwargs)¶ Find all occurrences of pattern or regular expression in the Series/Index.
- Parameters
- patstr
Pattern or regular expression.
- Returns
- DataFrame
All non-overlapping matches of pattern or regular expression in each string of this Series/Index.
Notes
flags parameter is currently not supported.
Examples
>>> import cudf >>> s = cudf.Series(['Lion', 'Monkey', 'Rabbit'])
The search for the pattern ‘Monkey’ returns one match:
>>> s.str.findall('Monkey') 0 0 None 1 Monkey 2 None
When the pattern matches more than one string in the Series, all matches are returned:
>>> s.str.findall('on') 0 0 on 1 on 2 None
Regular expressions are supported too. For instance, the search for all the strings ending with the word ‘on’ is shown next:
>>> s.str.findall('on$') 0 0 on 1 None 2 None
If the pattern is found more than once in the same string, then multiple strings are returned as columns:
>>> s.str.findall('b') 0 1 0 None None 1 None None 2 b b
-
get(i=0, **kwargs)¶ Extract element from each component at specified position.
- Parameters
- iint
Position of element to extract.
- Returns
- Series/Index of str dtype
Examples
>>> import cudf >>> s = cudf.Series(["hello world", "rapids", "cudf"]) >>> s 0 hello world 1 rapids 2 cudf dtype: object >>> s.str.get(10) 0 d 1 2 dtype: object >>> s.str.get(1) 0 e 1 a 2 u dtype: object
getalso accepts negative index number.>>> s.str.get(-1) 0 d 1 s 2 f dtype: object
-
htoi()¶ Returns integer value represented by each hex string. String is interpretted to have hex (base-16) characters.
- Returns
- Series/Index of str dtype
Examples
>>> import cudf >>> s = cudf.Series(["1234", "ABCDEF", "1A2", "cafe"]) >>> s.str.htoi() 0 4660 1 11259375 2 418 3 51966 dtype: int64
-
index(sub, start=0, end=None, **kwargs)¶ Return lowest indexes in each strings where the substring is fully contained between
[start:end]. This is the same as str.find except instead of returning -1, it raises a ValueError when the substring is not found.- Parameters
- substr
Substring being searched.
- startint
Left edge index.
- endint
Right edge index.
- Returns
- Series or Index of object
Examples
>>> import cudf >>> s = cudf.Series(['abc', 'a','b' ,'ddb']) >>> s.str.index('b') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not foundParameters such as start and end can also be used.
>>> s = cudf.Series(['abc', 'abb','ab' ,'ddb']) >>> s.str.index('b', start=1, end=5) 0 1 1 1 2 1 3 2 dtype: int32
-
insert(start=0, repl=None, **kwargs)¶ Insert the specified string into each string in the specified position.
- Parameters
- startint
Beginning position of the string to replace. Default is beginning of the each string. Specify -1 to insert at the end of each string.
- replstr
String to insert into the specified position value.
- Returns
- Series/Index of str dtype
A new string series with the specified string inserted at the specified position.
Examples
>>> import cudf >>> s = cudf.Series(["abcdefghij", "0123456789"]) >>> s.str.insert(2, '_') 0 ab_cdefghij 1 01_23456789 dtype: object
When no repl is passed, nothing is inserted.
>>> s.str.insert(2) 0 abcdefghij 1 0123456789 dtype: object
Negative values are also supported for start.
>>> s.str.insert(-1,'_') 0 abcdefghij_ 1 0123456789_ dtype: object
-
ip2int()¶ This converts ip strings to integers
- Returns
- Series/Index of str dtype
Examples
>>> import cudf >>> s = cudf.Series(["12.168.1.1", "10.0.0.1"]) >>> s.str.ip2int() 0 212336897 1 167772161 dtype: int64
Returns 0’s if any string is not an IP.
>>> s = cudf.Series(["12.168.1.1", "10.0.0.1", "abc"]) >>> s.str.ip2int() 0 212336897 1 167772161 2 0 dtype: int64
-
is_consonant(position, **kwargs)¶ Return true for strings where the character at
positionis a consonant. Thepositionparameter may also be a list of integers to check different characters per string. If thepositionis larger than the string length, False is returned for that string.- Parameters
- position: int or list-like
The character position to check within each string.
- Returns
- Series or Index of bool dtype.
Examples
>>> import cudf >>> ser = cudf.Series(["toy", "trouble"]) >>> ser.str.is_consonant(1) 0 False 1 True dtype: bool >>> positions = cudf.Series([2, 3]) >>> ser.str.is_consonant(positions) 0 True 1 False dtype: bool
-
is_vowel(position, **kwargs)¶ Return true for strings where the character at
positionis a vowel – not a consonant. Thepositionparameter may also be a list of integers to check different characters per string. If thepositionis larger than the string length, False is returned for that string.- Parameters
- position: int or list-like
The character position to check within each string.
- Returns
- Series or Index of bool dtype.
Examples
>>> import cudf >>> ser = cudf.Series(["toy", "trouble"]) >>> ser.str.is_vowel(1) 0 True 1 False dtype: bool >>> positions = cudf.Series([2, 3]) >>> ser.str.is_vowel(positions) 0 False 1 True dtype: bool
-
isalnum(**kwargs)¶ Check whether all characters in each string are alphanumeric.
This is equivalent to running the Python string method str.isalnum() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
Equivalent to:
isalpha() or isdigit() or isnumeric() or isdecimal()- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s1 = cudf.Series(['one', 'one1', '1', '']) >>> s1.str.isalnum() 0 True 1 True 2 True 3 False dtype: bool
Note that checks against characters mixed with any additional punctuation or whitespace will evaluate to false for an alphanumeric check.
>>> s2 = cudf.Series(['A B', '1.5', '3,000']) >>> s2.str.isalnum() 0 False 1 False 2 False dtype: bool
-
isalpha(**kwargs)¶ Check whether all characters in each string are alphabetic.
This is equivalent to running the Python string method str.isalpha() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s1 = cudf.Series(['one', 'one1', '1', '']) >>> s1.str.isalpha() 0 True 1 False 2 False 3 False dtype: bool
-
isdecimal(**kwargs)¶ Check whether all characters in each string are decimal.
This is equivalent to running the Python string method str.isdecimal() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s3 = cudf.Series(['23', '³', '⅕', ''])
The s3.str.isdecimal method checks for characters used to form numbers in base 10.
>>> s3.str.isdecimal() 0 True 1 False 2 False 3 False dtype: bool
-
isdigit(**kwargs)¶ Check whether all characters in each string are digits.
This is equivalent to running the Python string method str.isdigit() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s = cudf.Series(['23', '³', '⅕', ''])
The
s.str.isdigitmethod is the same ass.str.isdecimalbut also includes special digits, like superscripted and subscripted digits in unicode.>>> s.str.isdigit() 0 True 1 True 2 False 3 False dtype: bool
-
isempty(**kwargs)¶ Check whether each string is an empty string.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
Examples
>>> import cudf >>> s = cudf.Series(["1", "abc", "", " ", None]) >>> s.str.isempty() 0 False 1 False 2 True 3 False 4 False dtype: bool
-
isfloat(**kwargs)¶ Check whether all characters in each string form floating value.
If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s = cudf.Series(["1.1", "0.123213", "+0.123", "-100.0001", "234", ... "3-"]) >>> s.str.isfloat() 0 True 1 True 2 True 3 True 4 True 5 False dtype: bool >>> s = cudf.Series(["this is plain text", "\t\n", "9.9", "9.9.9"]) >>> s.str.isfloat() 0 False 1 False 2 True 3 False dtype: bool
-
ishex(**kwargs)¶ Check whether all characters in each string form a hex integer.
If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
Examples
>>> import cudf >>> s = cudf.Series(["", "123DEF", "0x2D3", "-15", "abc"]) >>> s.str.ishex() 0 False 1 True 2 True 3 False 4 True dtype: bool
-
isinteger(**kwargs)¶ Check whether all characters in each string form integer.
If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s = cudf.Series(["1", "0.1", "+100", "-15", "abc"]) >>> s.str.isinteger() 0 True 1 False 2 True 3 True 4 False dtype: bool >>> s = cudf.Series(["this is plan text", "", "10 10"]) >>> s.str.isinteger() 0 False 1 False 2 False dtype: bool
-
isipv4(**kwargs)¶ Check whether all characters in each string form an IPv4 address.
If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
Examples
>>> import cudf >>> s = cudf.Series(["", "127.0.0.1", "255.255.255.255", "123.456"]) >>> s.str.isipv4() 0 False 1 True 2 True 3 False dtype: bool
-
islower(**kwargs)¶ Check whether all characters in each string are lowercase.
This is equivalent to running the Python string method str.islower() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s = cudf.Series(['leopard', 'Golden Eagle', 'SNAKE', '']) >>> s.str.islower() 0 True 1 False 2 False 3 False dtype: bool
-
isnumeric(**kwargs)¶ Check whether all characters in each string are numeric.
This is equivalent to running the Python string method str.isnumeric() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s1 = cudf.Series(['one', 'one1', '1', '']) >>> s1.str.isnumeric() 0 False 1 False 2 True 3 False dtype: bool
The
s1.str.isnumericmethod is the same ass2.str.isdigitbut also includes other characters that can represent quantities such as unicode fractions.>>> s2 = pd.Series(['23', '³', '⅕', '']) >>> s2.str.isnumeric() 0 True 1 True 2 True 3 False dtype: bool
-
isspace(**kwargs)¶ Check whether all characters in each string are whitespace.
This is equivalent to running the Python string method str.isspace() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isupperCheck whether all characters are uppercase.
Examples
>>> import cudf >>> s = cudf.Series([' ', '\t\r\n ', '']) >>> s.str.isspace() 0 True 1 True 2 False dtype: bool
-
isupper(**kwargs)¶ Check whether all characters in each string are uppercase.
This is equivalent to running the Python string method str.isupper() for each element of the Series/Index. If a string has zero characters, False is returned for that check.
- ReturnsSeries or Index of bool
Series or Index of boolean values with the same length as the original Series/Index.
See also
isalnumCheck whether all characters are alphanumeric.
isalphaCheck whether all characters are alphabetic.
isdecimalCheck whether all characters are decimal.
isdigitCheck whether all characters are digits.
isintegerCheck whether all characters are integer.
isnumericCheck whether all characters are numeric.
isfloatCheck whether all characters are float.
islowerCheck whether all characters are lowercase.
isspaceCheck whether all characters are whitespace.
Examples
>>> import cudf >>> s = cudf.Series(['leopard', 'Golden Eagle', 'SNAKE', '']) >>> s.str.isupper() 0 False 1 False 2 True 3 False dtype: bool
-
join(sep)¶ Join lists contained as elements in the Series/Index with passed delimiter.
- RaisesNotImplementedError
Columns of arrays / lists are not yet supported.
-
len(**kwargs)¶ Computes the length of each element in the Series/Index.
- ReturnsSeries or Index of int
A Series or Index of integer values indicating the length of each element in the Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(["dog", "", "\n", None]) >>> s.str.len() 0 3 1 0 2 1 3 null dtype: int32
-
ljust(width, fillchar=' ', **kwargs)¶ Filling right side of strings in the Series/Index with an additional character. Equivalent to str.ljust().
- Parameters
- widthint
Minimum width of resulting string; additional characters will be filled with
fillchar.- fillcharstr, default ‘ ‘ (whitespace)
Additional character for filling, default is whitespace.
- Returns
- Series/Index of str dtype
Returns Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(["hello world", "rapids ai"]) >>> s.str.ljust(10, fillchar="_") 0 hello world 1 rapids ai_ dtype: object >>> s = cudf.Series(["a", "", "ab", "__"]) >>> s.str.ljust(1, fillchar="-") 0 a 1 - 2 ab 3 __ dtype: object
-
lower(**kwargs)¶ Converts all characters to lowercase.
Equivalent to str.lower().
- ReturnsSeries or Index of object
A copy of the object with all strings converted to lowercase.
See also
upperConverts all characters to uppercase.
titleConverts first character of each word to uppercase and remaining to lowercase.
capitalizeConverts first character to uppercase and remaining to lowercase.
swapcaseConverts uppercase to lowercase and lowercase to uppercase.
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'] >>> s = cudf.Series(data) >>> s.str.lower() 0 lower 1 capitals 2 this is a sentence 3 swapcase dtype: object
-
lstrip(to_strip=None, **kwargs)¶ Remove leading and trailing characters.
Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left side. Equivalent to str.lstrip().
- Parameters
- to_stripstr or None, default None
Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.
- Returns
- Series or Index of object
See also
Examples
>>> import cudf >>> s = cudf.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', None]) >>> s.str.lstrip('123.') 0 Ant. 1 Bee!\n 2 Cat?\t 3 None dtype: object
-
match(pat, case=True, flags=0, **kwargs)¶ Determine if each string matches a regular expression.
- Parameters
- patstr
Character sequence or regular expression.
- Returns
- Series or Index of boolean values.
Notes
Parameters currently not supported are: case, flags and na.
Examples
>>> import cudf >>> s = cudf.Series(["rapids", "ai", "cudf"])
Checking for strings starting with a.
>>> s.str.match('a') 0 False 1 True 2 False dtype: bool
Checking for strings starting with any of a or c.
>>> s.str.match('[ac]') 0 False 1 True 2 True dtype: bool
-
ngrams(n=2, separator='_', **kwargs)¶ Generate the n-grams from a set of tokens, each record in series is treated a token.
You can generate tokens from a Series instance using the
Series.str.tokenize()function.- Parameters
- nint
The degree of the n-gram (number of consecutive tokens). Default of 2 for bigrams.
- separatorstr
The separator to use between within an n-gram. Default is ‘_’.
Examples
>>> import cudf >>> str_series = cudf.Series(['this is my', 'favorite book']) >>> str_series = cudf.Series(['this is my', 'favorite book']) >>> str_series.str.ngrams(2, "_") 0 this is my_favorite book dtype: object >>> str_series = cudf.Series(['abc','def','xyz','hhh']) >>> str_series.str.ngrams(2, "_") 0 abc_def 1 def_xyz 2 xyz_hhh dtype: object
-
ngrams_tokenize(n=2, delimiter=' ', separator='_', **kwargs)¶ Generate the n-grams using tokens from each string. This will tokenize each string and then generate ngrams for each string.
- Parameters
- nint, Default 2.
The degree of the n-gram (number of consecutive tokens).
- delimiterstr, Default is white-space.
The character used to locate the split points of each string.
- sepstr, Default is ‘_’.
The separator to use between tokens within an n-gram.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> ser = cudf.Series(['this is the', 'best book']) >>> ser.str.ngrams_tokenize(n=2, sep='_') 0 this_is 1 is_the 2 best_book dtype: object
-
normalize_characters(do_lower=True, **kwargs)¶ Normalizes strings characters for tokenizing.
This uses the normalizer that is built into the subword_tokenize function which includes:
adding padding around punctuation (unicode category starts with “P”) as well as certain ASCII symbols like “^” and “$”
adding padding around the CJK Unicode block characters
changing whitespace (e.g.
\t,\n,\r) to spaceremoving control characters (unicode categories “Cc” and “Cf”)
If do_lower_case = true, lower-casing also removes the accents. The accents cannot be removed from upper-case characters without lower-casing and lower-casing cannot be performed without also removing accents. However, if the accented character is already lower-case, then only the accent is removed.
- Parameters
- do_lowerbool, Default is True
If set to True, characters will be lower-cased and accents will be removed. If False, accented and upper-case characters are not transformed.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> ser = cudf.Series(["héllo, \tworld","ĂĆCĖÑTED","$99"]) >>> ser.str.normalize_characters() 0 hello , world 1 accented 2 $ 99 dtype: object >>> ser.str.normalize_characters(do_lower=False) 0 héllo , world 1 ĂĆCĖÑTED 2 $ 99 dtype: object
-
normalize_spaces(**kwargs)¶ Remove extra whitespace between tokens and trim whitespace from the beginning and the end of each string.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> ser = cudf.Series(["hello \t world"," test string "]) >>> ser.str.normalize_spaces() 0 hello world 1 test string dtype: object
-
pad(width, side='left', fillchar=' ', **kwargs)¶ Pad strings in the Series/Index up to width.
- Parameters
- widthint
Minimum width of resulting string; additional characters will be filled with character defined in fillchar.
- side{‘left’, ‘right’, ‘both’}, default ‘left’
Side from which to fill resulting string.
- fillcharstr, default ‘ ‘ (whitespace)
Additional character for filling, default is whitespace.
- Returns
- Series/Index of object
Returns Series or Index with minimum number of char in object.
See also
rjustFills the left side of strings with an arbitrary character. Equivalent to
Series.str.pad(side='left').ljustFills the right side of strings with an arbitrary character. Equivalent to
Series.str.pad(side='right').centerFills boths sides of strings with an arbitrary character. Equivalent to
Series.str.pad(side='both').zfillPad strings in the Series/Index by prepending ‘0’ character. Equivalent to
Series.str.pad(side='left', fillchar='0').
Examples
>>> import cudf >>> s = cudf.Series(["caribou", "tiger"])
>>> s.str.pad(width=10) 0 caribou 1 tiger dtype: object
>>> s.str.pad(width=10, side='right', fillchar='-') 0 caribou--- 1 tiger----- dtype: object
>>> s.str.pad(width=10, side='both', fillchar='-') 0 -caribou-- 1 --tiger--- dtype: object
-
partition(sep=' ', expand=True, **kwargs)¶ Split the string at the first occurrence of sep.
This method splits the string at the first occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing the string itself, followed by two empty strings.
- Parameters
- sepstr, default ‘ ‘ (whitespace)
String to split on.
- Returns
- DataFrame or MultiIndex
Returns a DataFrame / MultiIndex
See also
rpartitionSplit the string at the last occurrence of sep.
splitSplit strings around given separators.
Notes
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Examples
>>> import cudf >>> s = cudf.Series(['Linda van der Berg', 'George Pitt-Rivers']) >>> s 0 Linda van der Berg 1 George Pitt-Rivers dtype: object
>>> s.str.partition() 0 1 2 0 Linda van der Berg 1 George Pitt-Rivers
To partition by something different than a space:
>>> s.str.partition('-') 0 1 2 0 Linda van der Berg 1 George Pitt - Rivers
Also available on indices:
>>> idx = cudf.core.index.StringIndex(['X 123', 'Y 999']) >>> idx StringIndex(['X 123' 'Y 999'], dtype='object')
Which will create a MultiIndex:
>>> idx.str.partition() MultiIndex(levels=[0 X 1 Y dtype: object, 0 dtype: object, 0 123 1 999 dtype: object], codes= 0 1 2 0 0 0 0 1 1 0 1)
-
porter_stemmer_measure(**kwargs)¶ Compute the Porter Stemmer measure for each string. The Porter Stemmer algorithm is described here.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> ser = cudf.Series(["hello", "super"]) >>> ser.str.porter_stemmer_measure() 0 1 1 2 dtype: int32
-
replace(pat, repl, n=- 1, case=None, flags=0, regex=True, **kwargs)¶ Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().
- Parameters
- patstr or list-like
String(s) to be replaced as a character sequence or regular expression.
- replstr or list-like
String(s) to be used as replacement.
- nint, default -1 (all)
Number of replacements to make from the start.
- regexbool, default True
If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.
- Returns
- Series/Index of str dtype
A copy of the object with all matching occurrences of pat replaced by repl.
Notes
The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Examples
>>> import cudf >>> s = cudf.Series(['foo', 'fuz', None]) >>> s 0 foo 1 fuz 2 None dtype: object
When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with
re.sub(). NaN value(s) in the Series are left as is:>>> s.str.replace('f.', 'ba', regex=True) 0 bao 1 baz 2 None dtype: object
When pat is a string and regex is False, every pat is replaced with repl as with
str.replace():>>> s.str.replace('f.', 'ba', regex=False) 0 foo 1 fuz 2 None dtype: object
-
replace_tokens(targets, replacements, delimiter=None, **kwargs)¶ The targets tokens are searched for within each string in the series and replaced with the corresponding replacements if found. Tokens are identified by the delimiter character provided.
- Parameters
- targetsarray-like, Sequence or Series
The tokens to search for inside each string.
- replacementsarray-like, Sequence, Series or str
The strings to replace for each found target token found. Alternately, this can be a single str instance and would be used as replacement for each string found.
- delimiterstr
The character used to locate the tokens of each string. Default is whitespace.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> sr = cudf.Series(["this is me", "theme music", ""]) >>> targets = cudf.Series(["is", "me"]) >>> sr.str.replace_tokens(targets=targets, replacements="_") 0 this _ _ 1 theme music 2 dtype: object >>> sr = cudf.Series(["this;is;me", "theme;music", ""]) >>> sr.str.replace_tokens(targets=targets, replacements=":") 0 this;is;me 1 theme;music 2 dtype: object
-
replace_with_backrefs(pat, repl, **kwargs)¶ Use the
replback-ref template to create a new string with the extracted elements found using thepatexpression.- Parameters
- patstr
Regex with groupings to identify extract sections. This should not be a compiled regex.
- replstr
String template containing back-reference indicators.
- Returns
- Series/Index of str dtype
Examples
>>> import cudf >>> s = cudf.Series(["A543","Z756"]) >>> s.str.replace_with_backrefs('(\d)(\d)', 'V\2\1') 0 AV453 1 ZV576 dtype: object
-
rfind(sub, start=0, end=None, **kwargs)¶ Return highest indexes in each strings in the Series/Index where the substring is fully contained between
[start:end]. Return -1 on failure. Equivalent to standard str.rfind().- Parameters
- substr
Substring being searched.
- startint
Left edge index.
- endint
Right edge index.
- Returns
- Series or Index of int
See also
findReturn lowest indexes in each strings.
Examples
>>> import cudf >>> s = cudf.Series(["abc", "hello world", "rapids ai"]) >>> s.str.rfind('a') 0 0 1 -1 2 7 dtype: int32
Using start and end parameters.
>>> s.str.rfind('a', start=2, end=5) 0 -1 1 -1 2 -1 dtype: int32
-
rindex(sub, start=0, end=None, **kwargs)¶ Return highest indexes in each strings where the substring is fully contained between
[start:end]. This is the same asstr.rfindexcept instead of returning -1, it raises aValueErrorwhen the substring is not found.- Parameters
- substr
Substring being searched.
- startint
Left edge index.
- endint
Right edge index.
- Returns
- Series or Index of object
Examples
>>> import cudf >>> s = cudf.Series(['abc', 'a','b' ,'ddb']) >>> s.str.rindex('b') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not foundParameters such as start and end can also be used.
>>> s = cudf.Series(['abc', 'abb','ab' ,'ddb']) >>> s.str.rindex('b', start=1, end=5) 0 1 1 2 2 1 3 2 dtype: int32
-
rjust(width, fillchar=' ', **kwargs)¶ Filling left side of strings in the Series/Index with an additional character. Equivalent to str.rjust().
- Parameters
- widthint
Minimum width of resulting string; additional characters will be filled with fillchar.
- fillcharstr, default ‘ ‘ (whitespace)
Additional character for filling, default is whitespace.
- Returns
- Series/Index of str dtype
Returns Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(["hello world", "rapids ai"]) >>> s.str.rjust(20, fillchar="_") 0 _________hello world 1 ___________rapids ai dtype: object >>> s = cudf.Series(["a", "", "ab", "__"]) >>> s.str.rjust(1, fillchar="-") 0 a 1 - 2 ab 3 __ dtype: object
-
rpartition(sep=' ', expand=True, **kwargs)¶ Split the string at the last occurrence of sep.
This method splits the string at the last occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing two empty strings, followed by the string itself.
- Parameters
- sepstr, default ‘ ‘ (whitespace)
String to split on.
- Returns
- DataFrame or MultiIndex
Returns a DataFrame / MultiIndex
Notes
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Examples
>>> import cudf >>> s = cudf.Series(['Linda van der Berg', 'George Pitt-Rivers']) >>> s 0 Linda van der Berg 1 George Pitt-Rivers dtype: object >>> s.str.rpartition() 0 1 2 0 Linda van der Berg 1 George Pitt-Rivers
Also available on indices:
>>> idx = cudf.core.index.StringIndex(['X 123', 'Y 999']) >>> idx StringIndex(['X 123' 'Y 999'], dtype='object')
Which will create a MultiIndex:
>>> idx.str.rpartition() MultiIndex(levels=[0 X 1 Y dtype: object, 0 dtype: object, 0 123 1 999 dtype: object], codes= 0 1 2 0 0 0 0 1 1 0 1)
-
rsplit(pat=None, n=- 1, expand=None, **kwargs)¶ Split strings around given separator/delimiter.
Splits the string in the Series/Index from the end, at the specified delimiter string. Equivalent to str.rsplit().
- Parameters
- patstr, default ‘ ‘ (space)
String to split on, does not yet support regular expressions.
- nint, default -1 (all)
Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.
- Returns
- DataFrame or MultiIndex
Returns a DataFrame/MultiIndex with each split as a column.
See also
splitSplit strings around given separator/delimiter.
str.splitStandard library version for split.
str.rsplitStandard library version for rsplit.
Notes
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set. The handling of the n keyword depends on the number of found splits:
If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n.
Examples
>>> import cudf >>> data = ["this is a regular sentence", ... "https://docs.python.org/3/tutorial/index.html", ... None] >>> s = cudf.Series(data) >>> s.str.rsplit(n=2) 0 1 2 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None 2 None None None
For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.
>>> s.str.rsplit("/", n=1, expand=True) 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 None None
-
rstrip(to_strip=None, **kwargs)¶ Remove leading and trailing characters.
Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from right side. Equivalent to str.rstrip().
- Parameters
- to_stripstr or None, default None
Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.
- Returns
- Series/Index of str dtype
Returns Series or Index.
See also
Examples
>>> import cudf >>> s = cudf.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', None]) >>> s 0 1. Ant. 1 2. Bee!\n 2 3. Cat?\t 3 None dtype: object >>> s.str.rstrip('.!? \n\t') 0 1. Ant 1 2. Bee 2 3. Cat 3 None dtype: object
-
slice(start=None, stop=None, step=None, **kwargs)¶ Slice substrings from each element in the Series or Index.
- Parameters
- startint, optional
Start position for slice operation.
- stopint, optional
Stop position for slice operation.
- stepint, optional
Step size for slice operation.
- Returns
- Series/Index of str dtype
Series or Index from sliced substring from original string object.
See also
slice_replaceReplace a slice with a string.
getReturn element at position. Equivalent to
Series.str.slice(start=i, stop=i+1)withibeing the position.
Examples
>>> import cudf >>> s = cudf.Series(["koala", "fox", "chameleon"]) >>> s 0 koala 1 fox 2 chameleon dtype: object >>> s.str.slice(start=1) 0 oala 1 ox 2 hameleon dtype: object >>> s.str.slice(start=-1) 0 a 1 x 2 n dtype: object >>> s.str.slice(stop=2) 0 ko 1 fo 2 ch dtype: object >>> s.str.slice(step=2) 0 kaa 1 fx 2 caeen dtype: object >>> s.str.slice(start=0, stop=5, step=3) 0 kl 1 f 2 cm dtype: object
-
slice_from(starts, stops, **kwargs)¶ Return substring of each string using positions for each string.
The starts and stops parameters are of Column type.
- Parameters
- startsSeries
Beginning position of each the string to extract. Default is beginning of the each string.
- stopsSeries
Ending position of the each string to extract. Default is end of each string. Use -1 to specify to the end of that string.
- Returns
- Series/Index of str dtype
A substring of each string using positions for each string.
Examples
>>> import cudf >>> s = cudf.Series(["hello","there"]) >>> s 0 hello 1 there dtype: object >>> starts = cudf.Series([1, 3]) >>> stops = cudf.Series([5, 5]) >>> s.str.slice_from(starts, stops) 0 ello 1 re dtype: object
-
slice_replace(start=None, stop=None, repl=None, **kwargs)¶ Replace the specified section of each string with a new string.
- Parameters
- startint, optional
Beginning position of the string to replace. Default is beginning of the each string.
- stopint, optional
Ending position of the string to replace. Default is end of each string.
- replstr, optional
String to insert into the specified position values.
- Returns
- Series/Index of str dtype
A new string with the specified section of the string replaced with repl string.
See also
sliceJust slicing without replacement.
Examples
>>> import cudf >>> s = cudf.Series(['a', 'ab', 'abc', 'abdc', 'abcde']) >>> s 0 a 1 ab 2 abc 3 abdc 4 abcde dtype: object
Specify just start, meaning replace start until the end of the string with repl.
>>> s.str.slice_replace(1, repl='X') 0 aX 1 aX 2 aX 3 aX 4 aX dtype: object
Specify just stop, meaning the start of the string to stop is replaced with repl, and the rest of the string is included.
>>> s.str.slice_replace(stop=2, repl='X') 0 X 1 X 2 Xc 3 Xdc 4 Xcde dtype: object
Specify start and stop, meaning the slice from start to stop is replaced with repl. Everything before or after start and stop is included as is.
>>> s.str.slice_replace(start=1, stop=3, repl='X') 0 aX 1 aX 2 aX 3 aXc 4 aXde dtype: object
-
split(pat=None, n=- 1, expand=None, **kwargs)¶ Split strings around given separator/delimiter.
Splits the string in the Series/Index from the beginning, at the specified delimiter string. Equivalent to str.split().
- Parameters
- patstr, default ‘ ‘ (space)
String to split on, does not yet support regular expressions.
- nint, default -1 (all)
Limit number of splits in output. None, 0, and -1 will all be interpreted as “all splits”.
- Returns
- DataFrame
Returns a DataFrame with each split as a column.
See also
rsplitSplits string around given separator/delimiter, starting from the right.
str.splitStandard library version for split.
str.rsplitStandard library version for rsplit.
Notes
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set. The handling of the n keyword depends on the number of found splits:
If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n
Examples
>>> import cudf >>> data = ["this is a regular sentence", ... "https://docs.python.org/index.html", None] >>> s = cudf.Series(data) >>> s 0 this is a regular sentence 1 https://docs.python.org/index.html 2 None dtype: object
The n parameter can be used to limit the number of splits on the delimiter.
>>> s.str.split(n=2) 0 1 2 0 this is a regular sentence 1 https://docs.python.org/index.html None None 2 None None None
The pat parameter can be used to split by other characters.
>>> s.str.split(pat = "/") 0 1 2 3 0 this is a regular sentence None None None 1 https: docs.python.org index.html 2 None None None None
-
startswith(pat, **kwargs)¶ Test if the start of each string element matches a pattern.
Equivalent to str.startswith().
- Parameters
- patstr or list-like
If str is an str, evaluates whether each string of series starts with pat. If pat is a list-like, evaluates whether self[i] starts with pat[i]. Regular expressions are not accepted.
- Returns
- Series or Index of bool
A Series of booleans indicating whether the given pattern matches the start of each string element.
See also
Examples
>>> import cudf >>> s 0 bat 1 Bear 2 cat 3 None dtype: object >>> s.str.startswith('b') 0 True 1 False 2 False 3 null dtype: bool
-
strip(to_strip=None, **kwargs)¶ Remove leading and trailing characters.
Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides. Equivalent to str.strip().
- Parameters
- to_stripstr or None, default None
Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.
- Returns
- Series/Index of str dtype
Returns Series or Index.
See also
Examples
>>> import cudf >>> s = cudf.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', None]) >>> s 0 1. Ant. 1 2. Bee!\n 2 3. Cat?\t 3 None dtype: object >>> s.str.strip() 0 1. Ant. 1 2. Bee! 2 3. Cat? 3 None dtype: object >>> s.str.strip('123.!? \n\t') 0 Ant 1 Bee 2 Cat 3 None dtype: object
-
subword_tokenize(hash_file, max_length=64, stride=48, do_lower=True, do_truncate=False, max_num_strings=100, max_num_chars=100000, max_rows_tensor=500, **kwargs)¶ Run CUDA BERT subword tokenizer on cuDF strings column. Encodes words to token ids using vocabulary from a pretrained tokenizer.
- Parameters
- hash_filestr
Path to hash file containing vocabulary of words with token-ids.
- max_lengthint, Default is 64
Limits the length of the sequence returned. If tokenized string is shorter than max_length, output will be padded with 0s. If the tokenized string is longer than max_length and do_truncate == False, there will be multiple returned sequences containing the overflowing token-ids.
- strideint, Default is 48
If do_truncate == False and the tokenized string is larger than max_length, the sequences containing the overflowing token-ids can contain duplicated token-ids from the main sequence. If max_length is equal to stride there are no duplicated-id tokens. If stride is 80% of max_length, 20% of the first sequence will be repeated on the second sequence and so on until the entire sentence is encoded.
- do_lowerbool, Default is True
If set to true, original text will be lowercased before encoding.
- do_truncatebool, Default is False
If set to true, strings will be truncated and padded to max_length. Each input string will result in exactly one output sequence. If set to false, there may be multiple output sequences when the max_length is smaller than generated tokens.
- max_num_stringsint, Default is 100
The maximum number of strings to be encoded.
- max_num_charsint, Default is 100000
The maximum number of characters in the input strings column.
- max_rows_tensorint, Default is 500
The maximum number of rows in the output
- Returns
- token-idsColumn
The token-ids for each string padded with 0s to max_length.
- attention-maskColumn
The mask for token-ids result where corresponding positions identify valid token-id values.
- metadataColumn
Each row contains the index id of the original string and the first and last index of the token-ids that are non-padded and non-overlapping.
Examples
>>> import cudf >>> ser = cudf.Series(['this is the', 'best book']) >>> tokens, masks, metadata = ser.str.subword_tokenize("bert_hash_table.txt")
-
swapcase(**kwargs)¶ Change each lowercase character to uppercase and vice versa. This only applies to ASCII characters at this time.
Equivalent to str.swapcase().
Returns : Series or Index of object
See also
lowerConverts all characters to lowercase.
upperConverts all characters to uppercase.
titleConverts first character of each word to uppercase and remaining to lowercase.
capitalizeConverts first character to uppercase and remaining to lowercase.
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'] >>> s = cudf.Series(data) >>> s 0 lower 1 CAPITALS 2 this is a sentence 3 SwApCaSe dtype: object >>> s.str.swapcase() 0 LOWER 1 capitals 2 THIS IS A SENTENCE 3 sWaPcAsE dtype: object
-
title(**kwargs)¶ Uppercase the first letter of each letter after a space and lowercase the rest. This only applies to ASCII characters at this time.
Equivalent to str.title().
Returns : Series or Index of object
See also
lowerConverts all characters to lowercase.
upperConverts all characters to uppercase.
capitalizeConverts first character to uppercase and remaining to lowercase.
swapcaseConverts uppercase to lowercase and lowercase to uppercase.
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe']) >>> s = cudf.Series(data) >>> s 0 lower 1 CAPITALS 2 this is a sentence 3 SwApCaSe dtype: object >>> s.str.title() 0 Lower 1 Capitals 2 This Is A Sentence 3 Swapcase dtype: object
-
token_count(delimiter=' ', **kwargs)¶ Each string is split into tokens using the provided delimiter. The returned integer sequence is the number of tokens in each string.
- Parameters
- delimiterstr or list of strs, Default is whitespace.
The characters or strings used to locate the split points of each string.
- Returns
- Series or Index.
Examples
>>> import cudf >>> ser = cudf.Series(["hello world","goodbye",""]) >>> ser.str.token_count() 0 2 1 1 2 0 dtype: int32
-
tokenize(delimiter=' ', **kwargs)¶ Each string is split into tokens using the provided delimiter(s). The sequence returned contains the tokens in the order they were found.
- Parameters
- delimiterstr or list of strs, Default is whitespace.
The string used to locate the split points of each string.
- Returns
- Series or Index of object.
Examples
>>> import cudf >>> data = ["hello world", "goodbye world", "hello goodbye"] >>> ser = cudf.Series(data) >>> ser.str.tokenize() 0 hello 1 world 2 goodbye 3 world 4 hello 5 goodbye dtype: object
-
translate(table, **kwargs)¶ Map all characters in the string through the given mapping table.
Equivalent to standard str.translate().
- Parameters
- tabledict
Table is a mapping of Unicode ordinals to Unicode ordinals, strings, or None. Unmapped characters are left untouched. str.maketrans() is a helper function for making translation tables.
- Returns
- Series or Index.
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence','SwApCaSe'] >>> s = cudf.Series(data) >>> s.str.translate({'a': "1"}) 0 lower 1 CAPITALS 2 this is 1 sentence 3 SwApC1Se dtype: object >>> s.str.translate({'a': "1", "e":"#"}) 0 low#r 1 CAPITALS 2 this is 1 s#nt#nc# 3 SwApC1S# dtype: object
-
upper(**kwargs)¶ Convert each string to uppercase. This only applies to ASCII characters at this time.
Equivalent to str.upper().
Returns : Series or Index of object
See also
lowerConverts all characters to lowercase.
upperConverts all characters to uppercase.
titleConverts first character of each word to uppercase and remaining to lowercase.
capitalizeConverts first character to uppercase and remaining to lowercase.
swapcaseConverts uppercase to lowercase and lowercase to uppercase.
Examples
>>> import cudf >>> data = ['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'] >>> s = cudf.Series(data) >>> s 0 lower 1 CAPITALS 2 this is a sentence 3 SwApCaSe dtype: object >>> s.str.upper() 0 LOWER 1 CAPITALS 2 THIS IS A SENTENCE 3 SWAPCASE dtype: object
-
url_decode(**kwargs)¶ Returns a URL-decoded format of each string. No format checking is performed. All characters are expected to be encoded as UTF-8 hex values.
- Returns
- Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(['A%2FB-C%2FD', 'e%20f.g', '4-5%2C6']) >>> s.str.url_decode() 0 A/B-C/D 1 e f.g 2 4-5,6 dtype: object >>> data = ["https%3A%2F%2Frapids.ai%2Fstart.html", ... "https%3A%2F%2Fmedium.com%2Frapids-ai"] >>> s = cudf.Series(data) >>> s.str.url_decode() 0 https://rapids.ai/start.html 1 https://medium.com/rapids-ai dtype: object
-
url_encode(**kwargs)¶ Returns a URL-encoded format of each string. No format checking is performed. All characters are encoded except for ASCII letters, digits, and these characters:
‘.’,’_’,’-‘,’~’. Encoding converts to hex using UTF-8 encoded bytes.- Returns
- Series or Index.
Examples
>>> import cudf >>> s = cudf.Series(['A/B-C/D', 'e f.g', '4-5,6']) >>> s.str.url_encode() 0 A%2FB-C%2FD 1 e%20f.g 2 4-5%2C6 dtype: object >>> data = ["https://rapids.ai/start.html", ... "https://medium.com/rapids-ai"] >>> s = cudf.Series(data) >>> s.str.url_encode() 0 https%3A%2F%2Frapids.ai%2Fstart.html 1 https%3A%2F%2Fmedium.com%2Frapids-ai dtype: object
-
wrap(width, **kwargs)¶ Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.
- Parameters
- widthint
Maximum line width.
- Returns
- Series or Index
Notes
The parameters expand_tabsbool, replace_whitespace, drop_whitespace, break_long_words, break_on_hyphens, expand_tabsbool are not yet supported and will raise a NotImplementedError if they are set to any value.
This method currently achieves behavior matching R’s stringr library
str_wrapfunction, the equivalent pandas implementation can be obtained using the following parameter setting:expand_tabs = False
replace_whitespace = True
drop_whitespace = True
break_long_words = False
break_on_hyphens = False
Examples
>>> import cudf >>> data = ['line to be wrapped', 'another line to be wrapped'] >>> s = cudf.Series(data) >>> s.str.wrap(12) 0 line to be\nwrapped 1 another line\nto be\nwrapped dtype: object
-
zfill(width, **kwargs)¶ Pad strings in the Series/Index by prepending ‘0’ characters.
Strings in the Series/Index are padded with ‘0’ characters on the left of the string to reach a total string length width. Strings in the Series/Index with length greater or equal to width are unchanged.
- Parameters
- widthint
Minimum length of resulting string; strings with length less than width be prepended with ‘0’ characters.
- Returns
- Series/Index of str dtype
Returns Series or Index with prepended ‘0’ characters.
See also
Notes
Differs from str.zfill() which has special handling for ‘+’/’-‘ in the string.
Examples
>>> import cudf >>> s = cudf.Series(['-1', '1', '1000', None]) >>> s 0 -1 1 1 2 1000 3 None dtype: object
Note that
Noneis not string, therefore it is converted toNone. The minus sign in'-1'is treated as a regular character and the zero is added to the left of it (str.zfill() would have moved it to the left).1000remains unchanged as it is longer than width.>>> s.str.zfill(3) 0 0-1 1 001 2 1000 3 None dtype: object
-
General Functions¶
-
cudf.core.reshape.concat(objs, axis=0, ignore_index=False, sort=None)¶ Concatenate DataFrames, Series, or Indices row-wise.
- Parameters
- objslist of DataFrame, Series, or Index
- axis{0/’index’, 1/’columns’}, default 0
The axis to concatenate along.
- ignore_indexbool, default False
Set True to ignore the index of the objs and provide a default range index instead.
- sortbool, default False
Sort non-concatenation axis if it is not already aligned.
- Returns
- A new object of like type with rows from each object in
objs.
- A new object of like type with rows from each object in
Examples
Combine two
Series.>>> import cudf >>> s1 = cudf.Series(['a', 'b']) >>> s2 = cudf.Series(['c', 'd']) >>> s1 0 a 1 b dtype: object >>> s2 0 c 1 d dtype: object >>> cudf.concat([s1, s2]) 0 a 1 b 0 c 1 d dtype: object
Clear the existing index and reset it in the result by setting the
ignore_indexoption toTrue.>>> cudf.concat([s1, s2], ignore_index=True) 0 a 1 b 2 c 3 d dtype: object
Combine two DataFrame objects with identical columns.
>>> df1 = cudf.DataFrame([['a', 1], ['b', 2]], ... columns=['letter', 'number']) >>> df1 letter number 0 a 1 1 b 2 >>> df2 = cudf.DataFrame([['c', 3], ['d', 4]], ... columns=['letter', 'number']) >>> df2 letter number 0 c 3 1 d 4 >>> cudf.concat([df1, df2]) letter number 0 a 1 1 b 2 0 c 3 1 d 4
Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with
nullvalues.>>> df3 = cudf.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], ... columns=['letter', 'number', 'animal']) >>> df3 letter number animal 0 c 3 cat 1 d 4 dog >>> cudf.concat([df1, df3], sort=False) letter number animal 0 a 1 None 1 b 2 None 0 c 3 cat 1 d 4 dog
Combine
DataFrameobjects horizontally along the x axis by passing inaxis=1.>>> df4 = cudf.DataFrame([['bird', 'polly'], ['monkey', 'george']], ... columns=['animal', 'name']) >>> df4 animal name 0 bird polly 1 monkey george >>> cudf.concat([df1, df4], axis=1) letter number animal name 0 a 1 bird polly 1 b 2 monkey george
-
cudf.core.reshape.get_dummies(df, prefix=None, prefix_sep='_', dummy_na=False, columns=None, cats={}, sparse=False, drop_first=False, dtype='uint8')¶ Returns a dataframe whose columns are the one hot encodings of all columns in df
- Parameters
- dfcudf.DataFrame
dataframe to encode
- prefixstr, dict, or sequence, optional
prefix to append. Either a str (to apply a constant prefix), dict mapping column names to prefixes, or sequence of prefixes to apply with the same length as the number of columns. If not supplied, defaults to the empty string
- prefix_sepstr, dict, or sequence, optional, default ‘_’
separator to use when appending prefixes
- dummy_naboolean, optional
Add a column to indicate Nones, if False Nones are ignored.
- catsdict, optional
dictionary mapping column names to sequences of integers representing that column’s category. See cudf.DataFrame.one_hot_encoding for more information. if not supplied, it will be computed
- sparseboolean, optional
Right now this is NON-FUNCTIONAL argument in rapids.
- drop_firstboolean, optional
Right now this is NON-FUNCTIONAL argument in rapids.
- columnssequence of str, optional
Names of columns to encode. If not provided, will attempt to encode all columns. Note this is different from pandas default behavior, which encodes all columns with dtype object or categorical
- dtypestr, optional
output dtype, default ‘uint8’
Examples
>>> import cudf >>> df = cudf.DataFrame({"a": ["value1", "value2", None], "b": [0, 0, 0]}) >>> cudf.get_dummies(df) b a_value1 a_value2 0 0 1 0 1 0 0 1 2 0 0 0
>>> cudf.get_dummies(df, dummy_na=True) b a_None a_value1 a_value2 0 0 0 1 0 1 0 0 0 1 2 0 1 0 0
>>> import numpy as np >>> df = cudf.DataFrame({"a":cudf.Series([1, 2, np.nan, None], ... nan_as_null=False)}) >>> df a 0 1.0 1 2.0 2 NaN 3 null
>>> cudf.get_dummies(df, dummy_na=True, columns=["a"]) a_1.0 a_2.0 a_nan a_null 0 1 0 0 0 1 0 1 0 0 2 0 0 1 0 3 0 0 0 1
-
cudf.core.reshape.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)¶ Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
- Parameters
- frameDataFrame
- id_varstuple, list, or ndarray, optional
Column(s) to use as identifier variables. default: None
- value_varstuple, list, or ndarray, optional
Column(s) to unpivot. default: all columns that are not set as id_vars.
- var_namescalar
Name to use for the variable column. default: frame.columns.name or ‘variable’
- value_namestr
Name to use for the value column. default: ‘value’
- Returns
- outDataFrame
Melted result
- Difference from pandas:
Does not support ‘col_level’ because cuDF does not have multi-index
Examples
>>> import cudf >>> df = cudf.DataFrame({'A': ['a', 'b', 'c'], ... 'B': [1, 3, 5], ... 'C': [2, 4, 6]}) >>> df A B C 0 a 1 2 1 b 3 4 2 c 5 6 >>> cudf.melt(df, id_vars=['A'], value_vars=['B']) A variable value 0 a B 1 1 b B 3 2 c B 5 >>> cudf.melt(df, id_vars=['A'], value_vars=['B', 'C']) A variable value 0 a B 1 1 b B 3 2 c B 5 3 a C 2 4 b C 4 5 c C 6
The names of ‘variable’ and ‘value’ columns can be customized:
>>> cudf.melt(df, id_vars=['A'], value_vars=['B'], ... var_name='myVarname', value_name='myValname') A myVarname myValname 0 a B 1 1 b B 3 2 c B 5
-
cudf.core.reshape.merge_sorted(objs, keys=None, by_index=False, ignore_index=False, ascending=True, na_position='last')¶ Merge a list of sorted DataFrame or Series objects.
Dataframes/Series in objs list MUST be pre-sorted by columns listed in keys, or by the index (if by_index=True).
- Parameters
- objslist of DataFrame, Series, or Index
- keyslist, default None
List of Column names to sort by. If None, all columns used (Ignored if index=True)
- by_indexbool, default False
Use index for sorting. keys input will be ignored if True
- ignore_indexbool, default False
Drop and ignore index during merge. Default range index will be used in the output dataframe.
- ascendingbool, default True
Sorting is in ascending order, otherwise it is descending
- na_position{‘first’, ‘last’}, default ‘last’
‘first’ nulls at the beginning, ‘last’ nulls at the end
- Returns
- A new, lexicographically sorted, DataFrame/Series.
-
cudf.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit='ns', infer_datetime_format=False, origin='unix', cache=True)¶ Convert argument to datetime.
- Parameters
- argint, float, str, datetime, list, tuple, 1-d array,
Series DataFrame/dict-like The object to convert to a datetime.
- errors{‘ignore’, ‘raise’, ‘coerce’, ‘warn’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
- If ‘warn’prints last exceptions as warnings and
return the input.
If ‘ignore’, then invalid parsing will return the input.
- dayfirstbool, default False
Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
- formatstr, default None
The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. See strftime documentation for more information on choices: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
- unitstr, default ‘ns’
The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin(unix epoch start). Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.
- infer_datetime_formatbool, default False
If True and no format is given, attempt to infer the format of the datetime strings, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.
- Returns
- datetime
If parsing succeeded. Return type depends on input: - list-like: DatetimeIndex - Series: Series of datetime64 dtype - scalar: Timestamp
Examples
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same
>>> import cudf >>> df = cudf.DataFrame({'year': [2015, 2016], ... 'month': [2, 3], ... 'day': [4, 5]}) >>> cudf.to_datetime(df) 0 2015-02-04 1 2016-03-05 dtype: datetime64[ns] >>> cudf.to_datetime(1490195805, unit='s') numpy.datetime64('2017-03-22T15:16:45.000000000') >>> cudf.to_datetime(1490195805433502912, unit='ns') numpy.datetime64('1780-11-20T01:02:30.494253056')
Index¶
-
class
cudf.core.index.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
RangeIndex¶
-
class
cudf.core.index.RangeIndex(start, stop=None, step=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the range of values in RangeIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_contiguousReturns if the index is contiguous.
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
startThe value of the start parameter (0 if this was not supplied).
stopThe value of the stop parameter.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage(**kwargs)Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
to_gpu_array([fillna])Get a dense numba device array for the data.
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the range of values in RangeIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_contiguous¶ Returns if the index is contiguous. True incase of RangeIndex.
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(**kwargs)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
property
start¶ The value of the start parameter (0 if this was not supplied).
-
property
stop¶ The value of the stop parameter.
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_gpu_array(fillna=None)¶ Get a dense numba device array for the data.
- Parameters
- fillnastr or None
Replacement value to fill in place of nulls.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
GenericIndex¶
-
class
cudf.core.index.GenericIndex(values, **kwargs)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
MultiIndex¶
-
class
cudf.core.multiindex.MultiIndex(levels=None, codes=None, sortorder=None, labels=None, names=None, dtype=None, copy=False, name=None, **kwargs)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
- codes
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
- is_contiguous
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
- labels
- levels
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
- nlevels
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn a CuPy representation of the MultiIndex.
values_hostReturn a numpy representation of the MultiIndex.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of MultiIndex objects together
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value)Fill null values with the specified value.
from_pandas(multiindex[, nan_as_null])Convert from a Pandas MultiIndex
get_level_values(level)Return the values at the requested level
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values[, level])Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_pandas(**kwargs)Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other, inplace])Replace values where the condition is False.
array_equal
copy
deepcopy
from_frame
from_product
from_tuples
nan_to_num
replace
to_frame
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of MultiIndex objects together
- Parameters
- otherMultiIndex or list/tuple of MultiIndex objects
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx1 = cudf.MultiIndex( ... levels=[[1, 2], ['blue', 'red']], ... codes=[[0, 0, 1, 1], [1, 0, 1, 0]]) >>> idx2 = cudf.MultiIndex( ... levels=[[3, 4], ['blue', 'red']], ... codes=[[0, 0, 1, 1], [1, 0, 1, 0]]) >>> idx1 MultiIndex(levels=[0 1 1 2 dtype: int64, 0 blue 1 red dtype: object], codes= 0 1 0 0 1 1 0 0 2 1 1 3 1 0) >>> idx2 MultiIndex(levels=[0 3 1 4 dtype: int64, 0 blue 1 red dtype: object], codes= 0 1 0 0 1 1 0 0 2 1 1 3 1 0) >>> idx1.append(idx2) MultiIndex(levels=[0 1 1 2 2 3 3 4 dtype: int64, 0 blue 1 red dtype: object], codes= 0 1 0 0 1 1 0 0 2 1 1 3 1 0 4 2 1 5 2 0 6 3 1 7 3 0)
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- Returns
- filledMultiIndex
Examples
>>> import cudf >>> index = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> index MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> index.fillna('hello') MultiIndex(levels=[0 a 1 b 2 c 3 hello dtype: object, 0 1 1 5 2 hello dtype: object], codes= x y 0 0 0 1 0 1 2 1 2 3 2 2 4 3 0)
-
classmethod
from_pandas(multiindex, nan_as_null=None)¶ Convert from a Pandas MultiIndex
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> pmi = pd.MultiIndex(levels=[['a', 'b'], ['c', 'd']], codes=[[0, 1], [1, ]]) >>> cudf.from_pandas(pmi) MultiIndex( ... )
-
get_level_values(level)¶ Return the values at the requested level
- Parameters
- levelint or label
- Returns
- An Index containing the values at the requested level.
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values, level=None)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index or Multi-Index
Sought values.
- levelstr or int, optional
Name or position of the index level to use (if the index is a MultiIndex).
- Returns
- ——-
- is_containedcupy array
CuPy array of boolean values.
- Notes
- ——-
- When `level` is None, `values` can only be MultiIndex, or a
- set/list-like tuples.
- When `level` is provided, `values` can be Index or MultiIndex,
- or a set/list-like tuples.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. For MultiIndex ndim is always 2.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_pandas(**kwargs)¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return a CuPy representation of the MultiIndex.
Only the values in the MultiIndex will be returned.
- Returns
- out: cupy.ndarray
The values of the MultiIndex.
Examples
>>> import cudf >>> midx = cudf.MultiIndex( ... levels=[[1, 3, 4, 5], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx.values array([[1, 1], [1, 5], [3, 2], [4, 2], [5, 1]]) >>> type(midx.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the MultiIndex.
Only the values in the MultiIndex will be returned.
- Returns
- outnumpy.ndarray
The values of the MultiIndex.
Examples
>>> import cudf >>> midx = cudf.MultiIndex( ... levels=[[1, 3, 4, 5], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx.values_host array([(1, 1), (1, 5), (3, 2), (4, 2), (5, 1)], dtype=object) >>> type(midx.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None, inplace=False)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Int8Index¶
-
class
cudf.core.index.Int8Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Int16Index¶
-
class
cudf.core.index.Int16Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Int32Index¶
-
class
cudf.core.index.Int32Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Int64Index¶
-
class
cudf.core.index.Int64Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
UInt8Index¶
-
class
cudf.core.index.UInt8Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
UInt16Index¶
-
class
cudf.core.index.UInt16Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
UInt32Index¶
-
class
cudf.core.index.UInt32Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
UInt64Index¶
-
class
cudf.core.index.UInt64Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Float32Index¶
-
class
cudf.core.index.Float32Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Float64Index¶
-
class
cudf.core.index.Float64Index(data=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
CategoricalIndex¶
-
class
cudf.core.index.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
categoriesThe categories of this categorical.
codesThe category codes of this categorical.
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
property
categories¶ The categories of this categorical.
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
property
codes¶ The category codes of this categorical.
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
StringIndex¶
-
class
cudf.core.index.StringIndex(values, **kwargs)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
strVectorized string functions for Series and Index.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
property
str¶ Vectorized string functions for Series and Index.
This mimics pandas
df.strinterface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
DatetimeIndex¶
-
class
cudf.core.index.DatetimeIndex(data=None, freq=None, tz=None, normalize=False, closed=None, ambiguous='raise', dayfirst=False, yearfirst=False, dtype=None, copy=False, name=None)¶ Immutable, ordered and sliceable sequence of integer labels. The basic object storing row labels for all cuDF objects.
- Parameters
- dataarray-like (1-dimensional)/ DataFrame
If it is a DataFrame, it will return a MultiIndex
- dtypeNumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
- copybool
Make a copy of input data.
- nameobject
Name to be stored in the index.
- tupleize_colsbool (default: True)
When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.
- Returns
- Index
cudf Index
Examples
>>> import cudf >>> cudf.Index([1, 2, 3], dtype="uint64", name="a") UInt64Index([1, 2, 3], dtype='uint64', name='a')
>>> cudf.Index(cudf.DataFrame({"a":[1, 2], "b":[2, 3]})) MultiIndex(levels=[0 1 1 2 dtype: int64, 0 2 1 3 dtype: int64], codes= a b 0 0 0 1 1 1)
- Attributes
- day
dtypedtype of the underlying values in GenericIndex.
emptyIndicator whether Index is empty.
gpu_valuesView the data as a numba device array object
- hour
is_monotonicAlias for is_monotonic_increasing.
is_monotonic_decreasingReturn if the index is monotonic decreasing (only equal or decreasing) values.
is_monotonic_increasingReturn if the index is monotonic increasing (only equal or increasing) values.
is_uniqueReturn if the index has unique values.
- minute
- month
nameReturns the name of the Index.
namesReturns a tuple containing the name of the Index.
ndimDimension of the data.
- second
shapeReturns a tuple representing the dimensionality of the Index.
sizeReturn the number of elements in the underlying data.
valuesReturn an array representing the data in the Index.
values_hostReturn a numpy representation of the Index.
- weekday
- year
Methods
acos()Get Trigonometric inverse cosine, element-wise.
any()Return whether any elements is True in Index.
append(other)Append a collection of Index options together.
argsort([ascending])Return the integer indices that would sort the index.
asin()Get Trigonometric inverse sine, element-wise.
astype(dtype[, copy])Create an Index with values cast to dtypes.
atan()Get Trigonometric inverse tangent, element-wise.
clip([lower, upper, inplace, axis])Trim values at input threshold(s).
copy([deep])Make a copy of this object.
cos()Get Trigonometric cosine, element-wise.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep])Return Index with duplicate values removed
dropna([how])Return an Index with null values removed.
equals(other)Determine if two Index objects contain the same elements.
exp()Get the exponential of all elements, element-wise.
fillna(value[, downcast])Fill null values with the specified value.
find_label_range(first, last)Find range that starts with first and ends with last, inclusively.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_level_values(level)Return an Index of values for requested level.
get_slice_bound(label, side, kind)Calculate slice bound that corresponds to given label.
Interleave Series columns of a table into a single column.
isin(values)Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
log()Get the natural logarithm of all elements, element-wise.
mask(cond[, other, inplace])Replace values where the condition is True.
max()Return the maximum value of the Index.
memory_usage([deep])Memory usage of the values.
min()Return the minimum value of the Index.
notna()Identify non-missing values.
notnull()Identify non-missing values.
rank([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeats elements consecutively.
sample([n, frac, replace, weights, …])Return a random sample of items from an axis of object.
scatter_by_map(map_index[, map_size, keep_index])Scatter to a list of dataframes.
searchsorted(values[, side, ascending, …])Find indices where elements should be inserted to maintain order
shift([periods, freq, axis, fill_value])Shift values by periods positions.
sin()Get Trigonometric sine, element-wise.
sort_values([return_indexer, ascending, key])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
sqrt()Get the non-negative square-root of all elements, element-wise.
sum()Return the sum of all values of the Index.
take(indices)Gather only the specific subset of indices
tan()Get Trigonometric tangent, element-wise.
tile(count)Repeats the rows from self DataFrame count times to form a new DataFrame.
to_array([fillna])Get a dense numpy array for the data.
to_arrow()Convert Index to a PyArrow Array.
Converts a cuDF object into a DLPack tensor.
to_frame([index, name])Create a DataFrame with a column containing this Index
Convert to a Pandas Index.
to_series([index, name])Create a Series with both index and values equal to the index keys.
unique()Return unique values in the index.
where(cond[, other])Replace values where the condition is False.
get_dt_field
replace
-
acos()¶ Get Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if y = x.cos(), then x = y.acos()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.acos() 0 3.141593 1 1.570796 2 0.000000 3 1.240482 4 1.047198 dtype: float64
acos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.acos() first second 0 3.141593 1.334606 1 1.570796 1.266104 2 1.047198 1.470629
acos operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.acos() Float64Index([ 3.141592653589793, 1.1592794807274085, 0.0, 1.5707963267948966, 1.266103672779499], dtype='float64')
-
any()¶ Return whether any elements is True in Index.
-
append(other)¶ Append a collection of Index options together.
- Parameters
- otherIndex or list/tuple of indices
- Returns
- appendedIndex
Examples
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Int64Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Int64Index([200, 400, 50], dtype='int64') >>> idx.append(other) Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
-
argsort(ascending=True, **kwargs)¶ Return the integer indices that would sort the index.
- Parameters
- ascendingbool, default True
If True, returns the indices for ascending order. If False, returns the indices for descending order.
- Returns
- arrayA cupy array containing Integer indices that
would sort the index if used as an indexer.
-
asin()¶ Get Trigonometric inverse sine, element-wise.
The inverse of sine so that, if y = x.sin(), then x = y.asin()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5]) >>> ser.asin() 0 -1.570796 1 0.000000 2 1.570796 3 0.330314 4 0.523599 dtype: float64
asin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, 0, 0.5], ... 'second': [0.234, 0.3, 0.1]}) >>> df first second 0 -1.0 0.234 1 0.0 0.300 2 0.5 0.100 >>> df.asin() first second 0 -1.570796 0.236190 1 0.000000 0.304693 2 0.523599 0.100167
asin operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.3], dtype='float64') >>> index.asin() Float64Index([-1.5707963267948966, 0.41151684606748806, 1.5707963267948966, 0.3046926540153975], dtype='float64')
-
astype(dtype, copy=False)¶ Create an Index with values cast to dtypes. The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
- Parameters
- dtypenumpy dtype
Use a numpy.dtype to cast entire Index object to.
- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
- Returns
- Index
Index with values cast to specified dtype.
-
atan()¶ Get Trigonometric inverse tangent, element-wise.
The inverse of tan so that, if y = x.tan(), then x = y.atan()
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 dtype: float64 >>> ser.atan() 0 -0.785398 1 0.000000 2 0.785398 3 0.313635 4 0.463648 5 -1.471128 dtype: float64
atan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.atan() first second 0 -0.785398 0.229864 1 -1.471128 0.291457 2 0.463648 1.471128
atan operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.atan() Float64Index([-0.7853981633974483, 0.3805063771123649, 0.7853981633974483, 0.0, 0.2914567944778671], dtype='float64')
-
clip(lower=None, upper=None, inplace=False, axis=1)¶ Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis. Currently only axis=1 is supported.
- Parameters
- lowerscalar or array_like, default None
Minimum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on lower. In case of Series/Index, lower is expected to be a scalar or an array of size 1.
- upperscalar or array_like, default None
Maximum threshold value. All values below this threshold will be set to it. If it is None, there will be no clipping based on upper. In case of Series, upper is expected to be a scalar or an array of size 1.
- inplacebool, default False
- Returns
- Clipped DataFrame/Series/Index/MultiIndex
Examples
>>> import cudf >>> df = cudf.DataFrame({"a":[1, 2, 3, 4], "b":['a', 'b', 'c', 'd']}) >>> df.clip(lower=[2, 'b'], upper=[3, 'c']) a b 0 2 b 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=None, upper=[3, 'c']) a b 0 1 a 1 2 b 2 3 c 3 3 c
>>> df.clip(lower=[2, 'b'], upper=None) a b 0 2 b 1 2 b 2 3 c 3 4 d
>>> df.clip(lower=2, upper=3, inplace=True) >>> df a b 0 2 2 1 2 3 2 3 3 3 3 3
>>> import cudf >>> sr = cudf.Series([1, 2, 3, 4]) >>> sr.clip(lower=2, upper=3) 0 2 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=None, upper=3) 0 1 1 2 2 3 3 3 dtype: int64
>>> sr.clip(lower=2, upper=None, inplace=True) >>> sr 0 2 1 2 2 3 3 4 dtype: int64
-
copy(deep=True)¶ Make a copy of this object.
- Parameters
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe is not copied.
- Returns
- copyIndex
-
cos()¶ Get Trigonometric cosine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.cos() 0 1.000000 1 0.947861 2 0.877583 3 0.525322 4 -0.448074 5 -0.598460 6 -0.283691 dtype: float64
cos operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.cos() first second 0 1.000000 0.862319 1 0.283662 -0.283691 2 -0.839072 -0.839039 3 -0.759688 -0.022097
cos operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.cos() Float64Index([ 0.9210609940028851, 0.8623188722876839, -0.5984600690578581, -0.4480736161291701], dtype='float64')
-
difference(other, sort=None)¶ Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
- Parameters
- otherIndex or array-like
- sortFalse or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
- Returns
- differenceIndex
Examples
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Int64Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Int64Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
-
drop_duplicates(keep='first')¶ Return Index with duplicate values removed
- Parameters
- keep{‘first’, ‘last’, False}, default ‘first’
- ‘first’Drop duplicates except for the
first occurrence.
- ‘last’Drop duplicates except for the
last occurrence.
False : Drop all duplicates.
- Returns
- deduplicatedIndex
-
dropna(how='any')¶ Return an Index with null values removed.
- Parameters
- how{‘any’, ‘all’}, default ‘any’
If the Index is a MultiIndex, drop the value when any or all levels are NaN.
- Returns
- validIndex
Examples
>>> import cudf >>> index = cudf.Index(['a', None, 'b', 'c']) >>> index StringIndex(['a' None 'b' 'c'], dtype='object') >>> index.dropna() StringIndex(['a' 'b' 'c'], dtype='object')
Using dropna on a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, None, 4, None], [1, 2, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 1 1 null 2 4 3 null dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.dropna() MultiIndex(levels=[0 1 1 4 dtype: int64, 0 1 1 2 2 5 dtype: int64], codes= x y 0 0 0 1 0 2 2 1 1)
-
property
dtype¶ dtype of the underlying values in GenericIndex.
-
property
empty¶ Indicator whether Index is empty.
True if Index is entirely empty (no items).
- Returns
- outbool
If Index is empty, return True, if not return False.
-
equals(other)¶ Determine if two Index objects contain the same elements.
- Returns
- out: bool
True if “other” is an Index and it has the same elements as calling index; False otherwise.
-
exp()¶ Get the exponential of all elements, element-wise.
Exponential is the inverse of the log function, so that x.exp().log() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise exponential.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.exp() 0 3.678794e-01 1 1.000000e+00 2 2.718282e+00 3 1.383117e+00 4 1.648721e+00 5 4.539993e-05 6 2.688117e+43 dtype: float64
exp operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.exp() first second 0 0.367879 1.263644 1 0.000045 1.349859 2 1.648721 22026.465795
exp operation on Index:
>>> index = cudf.Index([-1, 0.4, 1, 0, 0.3]) >>> index Float64Index([-1.0, 0.4, 1.0, 0.0, 0.3], dtype='float64') >>> index.exp() Float64Index([0.36787944117144233, 1.4918246976412703, 2.718281828459045, 1.0, 1.3498588075760032], dtype='float64')
-
fillna(value, downcast=None)¶ Fill null values with the specified value.
- Parameters
- valuescalar
Scalar value to use to fill nulls. This value cannot be a list-likes.
- downcastdict, default is None
This Parameter is currently NON-FUNCTIONAL.
- Returns
- filledIndex
Examples
>>> import cudf >>> index = cudf.Index([1, 2, None, 4]) >>> index Int64Index([1, 2, null, 4], dtype='int64') >>> index.fillna(3) Int64Index([1, 2, 3, 4], dtype='int64')
-
find_label_range(first, last)¶ Find range that starts with first and ends with last, inclusively.
- Returns
- begin, end2-tuple of int
The starting index and the ending index. The last value occurs at
end - 1position.
-
classmethod
from_pandas(index, nan_as_null=None)¶ Convert from a Pandas Index.
- Parameters
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
- Raises
- TypeError for invalid input type.
Examples
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.core.index.Index.from_pandas(pdi) Index(['10.0', '20.0', '30.0', 'null'], dtype='object') >>> cudf.core.index.Index.from_pandas(pdi, nan_as_null=False) Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
-
get_level_values(level)¶ Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
- Parameters
- levelint or str
It is either the integer position or the name of the level.
- Returns
- Index
Calling object, as there is only one level in the Index.
See also
cudf.core.multiindex.MultiIndex.get_level_valuesGet values for a level of a MultiIndex.
Notes
For Index, level should be 0, since there are no multiple levels.
Examples
>>> import cudf >>> idx = cudf.core.index.StringIndex(["a","b","c"]) >>> idx.get_level_values(0) StringIndex(['a' 'b' 'c'], dtype='object')
-
get_slice_bound(label, side, kind)¶ Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.- Parameters
- labelobject
- side{‘left’, ‘right’}
- kind{‘ix’, ‘loc’, ‘getitem’}
- Returns
- int
Index of label.
-
property
gpu_values¶ View the data as a numba device array object
-
interleave_columns()¶ Interleave Series columns of a table into a single column.
Converts the column major table cols into a row major column.
- Parameters
- colsinput Table containing columns to interleave.
- Returns
- The interleaved columns as a single column
Examples
>>> df = DataFrame([['A1', 'A2', 'A3'], ['B1', 'B2', 'B3']]) >>> df 0 [A1, A2, A3] 1 [B1, B2, B3] >>> df.interleave_columns() 0 A1 1 B1 2 A2 3 B2 4 A3 5 B3
-
property
is_monotonic¶ Alias for is_monotonic_increasing.
-
property
is_monotonic_decreasing¶ Return if the index is monotonic decreasing (only equal or decreasing) values.
-
property
is_monotonic_increasing¶ Return if the index is monotonic increasing (only equal or increasing) values.
-
property
is_unique¶ Return if the index has unique values.
-
isin(values)¶ Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
- Parameters
- valuesset, list-like, Index
Sought values.
- Returns
- is_containedcupy array
CuPy array of boolean values.
-
isna()¶ Identify missing values. Alias for isnull
-
isnull()¶ Identify missing values.
-
join(other, how='left', level=None, return_indexers=False, sort=False)¶ Compute join_index and indexers to conform data structures to the new index.
- Parameters
- otherIndex.
- how{‘left’, ‘right’, ‘inner’, ‘outer’}
- return_indexersbool, default False
- sortbool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
- Returns: index
Examples
>>> import cudf >>> lhs = cudf.DataFrame( ... {"a":[2, 3, 1], "b":[3, 4, 2]}).set_index(['a', 'b'] ... ).index >>> rhs = cudf.DataFrame({"a":[1, 4, 3]}).set_index('a').index >>> lhs.join(rhs, how='inner') MultiIndex(levels=[0 1 1 3 dtype: int64, 0 2 1 4 dtype: int64], codes= a b 0 1 1 1 0 0)
-
log()¶ Get the natural logarithm of all elements, element-wise.
Natural logarithm is the inverse of the exp function, so that x.log().exp() = x
- Returns
- DataFrame/Series/Index
Result of the element-wise natural logarithm.
Examples
>>> import cudf >>> ser = cudf.Series([-1, 0, 1, 0.32434, 0.5, -10, 100]) >>> ser 0 -1.00000 1 0.00000 2 1.00000 3 0.32434 4 0.50000 5 -10.00000 6 100.00000 dtype: float64 >>> ser.log() 0 NaN 1 -inf 2 0.000000 3 -1.125963 4 -0.693147 5 NaN 6 4.605170 dtype: float64
log operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-1, -10, 0.5], ... 'second': [0.234, 0.3, 10]}) >>> df first second 0 -1.0 0.234 1 -10.0 0.300 2 0.5 10.000 >>> df.log() first second 0 NaN -1.452434 1 NaN -1.203973 2 -0.693147 2.302585
log operation on Index:
>>> index = cudf.Index([10, 11, 500.0]) >>> index Float64Index([10.0, 11.0, 500.0], dtype='float64') >>> index.log() Float64Index([2.302585092994046, 2.3978952727983707, 6.214608098422191], dtype='float64')
-
mask(cond, other=None, inplace=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like
Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
- Returns
- Same type as caller
Examples
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.mask(df % 2 == 0, [-1, -1]) A B 0 1 3 1 -1 5 2 5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.mask(ser > 2, 10) 0 10 1 10 2 2 3 1 4 0 dtype: int64 >>> ser.mask(ser > 2) 0 null 1 null 2 2 3 1 4 0 dtype: int64
-
max()¶ Return the maximum value of the Index.
- Returns
- scalar
Maximum value.
See also
Index.minReturn the minimum value in an Index.
cudf.core.series.Series.maxReturn the maximum value in a Series.
cudf.core.dataframe.DataFrame.maxReturn the maximum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.max() 3
-
memory_usage(deep=False)¶ Memory usage of the values.
- Parameters
- deepbool
Introspect the data deeply, interrogate object dtypes for system-level memory consumption.
- Returns
- bytes used
-
min()¶ Return the minimum value of the Index.
- Returns
- scalar
Minimum value.
See also
Index.maxReturn the maximum value in an Index.
cudf.core.series.Series.minReturn the minimum value in a Series.
cudf.core.dataframe.DataFrame.minReturn the minimum values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.min() 1
-
property
name¶ Returns the name of the Index.
-
property
names¶ Returns a tuple containing the name of the Index.
-
property
ndim¶ Dimension of the data. Apart from MultiIndex ndim is always 1.
-
notna()¶ Identify non-missing values. Alias for notnull.
-
notnull()¶ Identify non-missing values.
-
rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)¶ Compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties): * average: average rank of the group * min: lowest rank in the group * max: highest rank in the group * first: ranks assigned in order they appear in the array * dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values: * keep: assign NaN rank to NaN values * top: assign smallest rank to NaN values if ascending * bottom: assign highest rank to NaN values if ascending.
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns
- same type as caller
Return a Series or DataFrame with data ranks as values.
-
rename(name, inplace=False)¶ Alter Index name.
Defaults to returning new index.
- Parameters
- namelabel
Name(s) to set.
- Returns
- Index
-
repeat(repeats, axis=None)¶ Repeats elements consecutively.
Returns a new object of caller type(DataFrame/Series/Index) where each element of the current object is repeated consecutively a given number of times.
- Parameters
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
- Returns
- Series/DataFrame/Index
A newly created object of same type as caller with repeated elements.
Examples
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30]}) >>> df a b 0 1 10 1 2 20 2 3 30 >>> df.repeat(3) a b 0 1 10 0 1 10 0 1 10 1 2 20 1 2 20 1 2 20 2 3 30 2 3 30 2 3 30
Repeat on Series
>>> s = cudf.Series([0, 2]) >>> s 0 0 1 2 dtype: int64 >>> s.repeat([3, 4]) 0 0 0 0 0 0 1 2 1 2 1 2 1 2 dtype: int64 >>> s.repeat(2) 0 0 0 0 1 2 1 2 dtype: int64
Repeat on Index
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Int64Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
-
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, keep_index=True)¶ Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once. replace == True is not yet supported for axis = 1/”columns”
- weightsstr or ndarray-like, optional
Only supported for axis=1/”columns”
- random_stateint or None, default None
Seed for the random number generator (if int), or None. If None, a random seed will be chosen.
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series and Index doesn’t support axis=1.
- Returns
- Series or DataFrame or Index
A new object of same type as caller containing n items randomly sampled from the caller object.
Examples
>>> import cudf as cudf >>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}}) >>> df.sample(3) a 1 2 3 4 0 1
>>> sr = cudf.Series([1, 2, 3, 4, 5]) >>> sr.sample(10, replace=True) 1 4 3 1 2 4 0 5 0 1 4 5 4 1 0 2 0 3 3 2 dtype: int64
>>> df = cudf.DataFrame( ... {"a":[1, 2], "b":[2, 3], "c":[3, 4], "d":[4, 5]}) >>> df.sample(2, axis=1) a c 0 1 3 1 2 4
-
scatter_by_map(map_index, map_size=None, keep_index=True, **kwargs)¶ Scatter to a list of dataframes.
Uses map_index to determine the destination of each row of the original DataFrame.
- Parameters
- map_indexSeries, str or list-like
Scatter assignment for each row
- map_sizeint
Length of output list. Must be >= uniques in map_index
- keep_indexbool
Conserve original index values for each row
- Returns
- A list of cudf.DataFrame objects.
-
searchsorted(values, side='left', ascending=True, na_position='last')¶ Find indices where elements should be inserted to maintain order
- Parameters
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left‘
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last‘
Position of null values in sorted order
- Returns
- 1-D cupy array of insertion points
Examples
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
-
property
shape¶ Returns a tuple representing the dimensionality of the Index.
-
shift(periods=1, freq=None, axis=0, fill_value=None)¶ Shift values by periods positions.
-
sin()¶ Get Trigonometric sine, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.sin() 0 0.000000 1 0.318683 2 0.479426 3 0.850904 4 0.893997 5 -0.801153 6 0.958916 dtype: float64
sin operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.sin() first second 0 0.000000 -0.506366 1 -0.958924 0.958916 2 -0.544021 -0.544072 3 0.650288 -0.999756
sin operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.sin() Float64Index([-0.3894183423086505, -0.5063656411097588, 0.8011526357338306, 0.8939966636005579], dtype='float64')
-
property
size¶ Return the number of elements in the underlying data.
- Returns
- sizeSize of the DataFrame / Index / Series / MultiIndex
Examples
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Float64Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Int64Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex(levels=[0 a 1 b 2 c 3 None dtype: object, 0 1 1 None 2 5 dtype: object], codes= x y 0 0 0 1 0 2 2 1 1 3 2 1 4 3 0) >>> midx.size 5
-
sort_values(return_indexer=False, ascending=True, key=None)¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
- Returns
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See also
cudf.core.series.Series.minSort values of a Series.
cudf.core.dataframe.DataFrame.sort_valuesSort values in a DataFrame.
Examples
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior). >>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype=’int64’)
Sort values in descending order, and also get the indices idx was sorted by. >>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype=’int64’), array([3, 1, 0, 2],
dtype=int32))
-
sqrt()¶ Get the non-negative square-root of all elements, element-wise.
- Returns
- DataFrame/Series/Index
Result of the non-negative square-root of each element.
Examples
>>> import cudf >>> import cudf >>> ser = cudf.Series([10, 25, 81, 1.0, 100]) >>> ser 0 10.0 1 25.0 2 81.0 3 1.0 4 100.0 dtype: float64 >>> ser.sqrt() 0 3.162278 1 5.000000 2 9.000000 3 1.000000 4 10.000000 dtype: float64
sqrt operation on DataFrame:
>>> df = cudf.DataFrame({'first': [-10.0, 100, 625], ... 'second': [1, 2, 0.4]}) >>> df first second 0 -10.0 1.0 1 100.0 2.0 2 625.0 0.4 >>> df.sqrt() first second 0 NaN 1.000000 1 10.0 1.414214 2 25.0 0.632456
sqrt operation on Index:
>>> index = cudf.Index([-10.0, 100, 625]) >>> index Float64Index([-10.0, 100.0, 625.0], dtype='float64') >>> index.sqrt() Float64Index([nan, 10.0, 25.0], dtype='float64')
-
sum()¶ Return the sum of all values of the Index.
- Returns
- scalar
Sum of all values.
Examples
>>> import cudf >>> idx = cudf.Index([3, 2, 1]) >>> idx.sum() 6
-
take(indices)¶ Gather only the specific subset of indices
- Parameters
- indices: An array-like that maps to values contained in this Index.
-
tan()¶ Get Trigonometric tangent, element-wise.
- Returns
- DataFrame/Series/Index
Result of the trigonometric operation.
Examples
>>> import cudf >>> ser = cudf.Series([0.0, 0.32434, 0.5, 45, 90, 180, 360]) >>> ser 0 0.00000 1 0.32434 2 0.50000 3 45.00000 4 90.00000 5 180.00000 6 360.00000 dtype: float64 >>> ser.tan() 0 0.000000 1 0.336213 2 0.546302 3 1.619775 4 -1.995200 5 1.338690 6 -3.380140 dtype: float64
tan operation on DataFrame:
>>> df = cudf.DataFrame({'first': [0.0, 5, 10, 15], ... 'second': [100.0, 360, 720, 300]}) >>> df first second 0 0.0 100.0 1 5.0 360.0 2 10.0 720.0 3 15.0 300.0 >>> df.tan() first second 0 0.000000 -0.587214 1 -3.380515 -3.380140 2 0.648361 0.648446 3 -0.855993 45.244742
tan operation on Index:
>>> index = cudf.Index([-0.4, 100, -180, 90]) >>> index Float64Index([-0.4, 100.0, -180.0, 90.0], dtype='float64') >>> index.tan() Float64Index([-0.4227932187381618, -0.587213915156929, -1.3386902103511544, -1.995200412208242], dtype='float64')
-
tile(count)¶ Repeats the rows from self DataFrame count times to form a new DataFrame.
- Parameters
- selfinput Table containing columns to interleave.
- countNumber of times to tile “rows”. Must be non-negative.
- Returns
- The table containing the tiled “rows”.
Examples
>>> df = Dataframe([[8, 4, 7], [5, 2, 3]]) >>> count = 2 >>> df.tile(df, count) 0 1 2 0 8 4 7 1 5 2 3 0 8 4 7 1 5 2 3
-
to_array(fillna=None)¶ Get a dense numpy array for the data.
- Parameters
- fillnastr or None
Defaults to None, which will skip null values. If it equals “pandas”, null values are filled with NaNs. Non integral dtype is promoted to np.float64.
Notes
if
fillnaisNone, null values are skipped. Therefore, the output size could be smaller.
-
to_arrow()¶ Convert Index to a PyArrow Array.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx.to_arrow() <pyarrow.lib.Int64Array object at 0x7fcaa6f53440> [ -3, 10, 15, 20 ]
-
to_dlpack()¶ Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
- Parameters
- cudf_objDataFrame, Series, Index, or Column
- Returns
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
to_frame(index=True, name=None)¶ Create a DataFrame with a column containing this Index
- Parameters
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- namestr, default None
Name to be used for the column
- Returns
- DataFrame
cudf DataFrame
-
to_pandas()¶ Convert to a Pandas Index.
Examples
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Int64Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Int64Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.numeric.Int64Index'> >>> type(idx) <class 'cudf.core.index.GenericIndex'>
-
to_series(index=None, name=None)¶ Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
- Parameters
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Dame of resulting Series. If None, defaults to name of original index.
- Returns
- Series
The dtype will be based on the type of the Index values.
-
unique()¶ Return unique values in the index.
- Returns
- Index without duplicates
-
property
values¶ Return an array representing the data in the Index.
- Returns
- arrayA cupy array of data in the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values array([ 1, -10, 100, 20]) >>> type(index.values) <class 'cupy.core.core.ndarray'>
-
property
values_host¶ Return a numpy representation of the Index.
Only the values in the Index will be returned.
- Returns
- outnumpy.ndarray
The values of the Index.
Examples
>>> import cudf >>> index = cudf.Index([1, -10, 100, 20]) >>> index.values_host array([ 1, -10, 100, 20]) >>> type(index.values_host) <class 'numpy.ndarray'>
-
where(cond, other=None)¶ Replace values where the condition is False.
- Parameters
- condbool array-like with the same length as self
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, or array-like
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
- Returns
- Same type as caller
Examples
>>> import cudf >>> index = cudf.Index([4, 3, 2, 1, 0]) >>> index Int64Index([4, 3, 2, 1, 0], dtype='int64') >>> index.where(index > 2, 15) Int64Index([4, 3, 15, 15, 15], dtype='int64')
Categories¶
-
class
cudf.core.column.categorical.CategoricalAccessor(column, parent=None)¶ Accessor object for categorical properties of the Series values. Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default.
- Parameters
- dataSeries or CategoricalIndex
Examples
>>> s = cudf.Series([1,2,3], dtype='category') >>> s >>> s 0 1 1 2 2 3 dtype: category Categories (3, int64): [1, 2, 3] >>> s.cat.categories Int64Index([1, 2, 3], dtype='int64') >>> s.cat.reorder_categories([3,2,1]) 0 1 1 2 2 3 dtype: category Categories (3, int64): [3, 2, 1] >>> s.cat.remove_categories([1]) 0 null 1 2 2 3 dtype: category Categories (2, int64): [2, 3] >>> s.cat.set_categories(list('abcde')) 0 null 1 null 2 null dtype: category Categories (5, object): [a, b, c, d, e] >>> s.cat.as_ordered() 0 1 1 2 2 3 dtype: category Categories (3, int64): [1 < 2 < 3] >>> s.cat.as_unordered() 0 1 1 2 2 3 dtype: category Categories (3, int64): [1, 2, 3]
- Attributes
categoriesThe categories of this categorical.
codesReturn Series of codes as well as the index.
orderedWhether the categories have an ordered relationship.
Methods
add_categories(new_categories, **kwargs)Add new categories.
as_ordered(**kwargs)Set the Categorical to be ordered.
as_unordered(**kwargs)Set the Categorical to be unordered.
remove_categories(removals, **kwargs)Remove the specified categories.
reorder_categories(new_categories, **kwargs)Reorder categories as specified in new_categories.
set_categories(new_categories, **kwargs)Set the categories to the specified new_categories.
-
add_categories(new_categories, **kwargs)¶ Add new categories.
new_categories will be included at the last/highest place in the categories and will be unused directly after this call.
- Parameters
- new_categoriescategory or list-like of category
The new categories to be included.
- inplacebool, default False
Whether or not to add the categories inplace or return a copy of this categorical with added categories.
- Returns
- cat
Categorical with new categories added or None if inplace.
Examples
>>> import cudf >>> s = cudf.Series([1, 2], dtype="category") >>> s 0 1 1 2 dtype: category Categories (2, int64): [1, 2] >>> s.cat.add_categories([0, 3, 4]) 0 1 1 2 dtype: category Categories (5, int64): [1, 2, 0, 3, 4] >>> s 0 1 1 2 dtype: category Categories (2, int64): [1, 2] >>> s.cat.add_categories([0, 3, 4], inplace=True) >>> s 0 1 1 2 dtype: category Categories (5, int64): [1, 2, 0, 3, 4]
-
as_ordered(**kwargs)¶ Set the Categorical to be ordered.
- Parameters
- inplacebool, default False
Whether or not to add the categories inplace or return a copy of this categorical with added categories.
- Returns
- Categorical
Ordered Categorical or None if inplace.
Examples
>>> import cudf >>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category") >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.as_ordered() 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1 < 2 < 10] >>> s.cat.as_ordered(inplace=True) >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1 < 2 < 10]
-
as_unordered(**kwargs)¶ Set the Categorical to be unordered.
- Parameters
- inplacebool, default False
Whether or not to set the ordered attribute in-place or return a copy of this categorical with ordered set to False.
- Returns
- Categorical
Unordered Categorical or None if inplace.
Examples
>>> import cudf >>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category") >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s = s.cat.as_ordered() >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1 < 2 < 10] >>> s.cat.as_unordered() 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.as_unordered(inplace=True) >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10]
-
property
categories¶ The categories of this categorical.
-
property
codes¶ Return Series of codes as well as the index.
-
property
ordered¶ Whether the categories have an ordered relationship.
-
remove_categories(removals, **kwargs)¶ Remove the specified categories.
removals must be included in the old categories. Values which were in the removed categories will be set to null.
- Parameters
- removalscategory or list-like of category
The categories which should be removed.
- inplacebool, default False
Whether or not to remove the categories inplace or return a copy of this categorical with removed categories.
- Returns
- cat
Categorical with removed categories or None if inplace.
Examples
>>> import cudf >>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category") >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.remove_categories([1]) 0 10 1 null 2 null 3 2 4 10 5 2 6 10 dtype: category Categories (2, int64): [2, 10] >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.remove_categories([10], inplace=True) >>> s 0 null 1 1 2 1 3 2 4 null 5 2 6 null dtype: category Categories (2, int64): [1, 2]
-
reorder_categories(new_categories, **kwargs)¶ Reorder categories as specified in new_categories.
new_categories need to include all old categories and no new category items.
- Parameters
- new_categoriesIndex-like
The categories in new order.
- orderedbool, optional
Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information.
- inplacebool, default False
Whether or not to reorder the categories inplace or return a copy of this categorical with reordered categories.
- Returns
- cat
Categorical with reordered categories or None if inplace.
- Raises
- ValueError
If the new categories do not contain all old category items or any new ones.
Examples
>>> import cudf >>> s = cudf.Series([10, 1, 1, 2, 10, 2, 10], dtype="category") >>> s 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.reorder_categories([10, 1, 2]) 0 10 1 1 2 1 3 2 4 10 5 2 6 10 dtype: category Categories (3, int64): [10, 1, 2] >>> s.cat.reorder_categories([10, 1]) ValueError: items in new_categories are not the same as in old categories
-
set_categories(new_categories, **kwargs)¶ Set the categories to the specified new_categories.
new_categories can include new categories (which will result in unused categories) or remove old categories (which results in values set to null). If rename==True, the categories will simple be renamed (less or more items than in old categories will result in values set to null or in unused categories respectively).
This method can be used to perform more than one action of adding, removing, and reordering simultaneously and is therefore faster than performing the individual steps via the more specialised methods.
On the other hand this methods does not do checks (e.g., whether the old categories are included in the new categories on a reorder), which can result in surprising changes.
- Parameters
- new_categorieslist-like
The categories in new order.
- orderedbool, default False
Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information.
- renamebool, default False
Whether or not the new_categories should be considered as a rename of the old categories or as reordered categories.
- inplacebool, default False
Whether or not to reorder the categories in-place or return a copy of this categorical with reordered categories.
- Returns
- cat
Categorical with reordered categories or None if inplace.
Examples
>>> import cudf >>> s = cudf.Series([1, 1, 2, 10, 2, 10], dtype='category') >>> s 0 1 1 1 2 2 3 10 4 2 5 10 dtype: category Categories (3, int64): [1, 2, 10] >>> s.cat.set_categories([1, 10]) 0 1 1 1 2 null 3 10 4 null 5 10 dtype: category Categories (2, int64): [1, 10] >>> s.cat.set_categories([1, 10], inplace=True) >>> s 0 1 1 1 2 null 3 10 4 null 5 10 dtype: category Categories (2, int64): [1, 10]
GroupBy¶
-
class
cudf.core.groupby.groupby.GroupBy(obj, by=None, level=None, sort=True, as_index=True, dropna=True)¶ Group a DataFrame or Series by a set of columns.
- Parameters
- byoptional
Specifies the grouping columns. Can be any of the following: - A Python function called on each value of the object’s index - A dict or Series that maps index labels to group names - A cudf.Index object - A str indicating a column name - An array of the same length as the object - A Grouper object - A list of the above
- levelint, level_name or list, optional
For objects with a MultiIndex, level can be used to specify grouping by one or more levels of the MultiIndex.
- sortTrue, optional
If True (default), sort results by group9s). Note that unlike Pandas, this also sorts values within each group.
- as_indexbool, optional
If as_index=True (default), the group names appear as the keys of the resulting DataFrame. If as_index=False, the groups are returned as ordinary columns of the resulting DataFrame, if they are named columns.
- dropnabool, optional
If True (default), do not include the “null” group.
Methods
agg(func)Apply aggregation(s) to the groups.
aggregate(func)Apply aggregation(s) to the groups.
apply(function)Apply a python transformation function over the grouped chunk.
apply_grouped(function, **kwargs)Apply a transformation function over the grouped chunk.
device_deserialize(header, frames)Convert serialized header and frames back into respective Object Type
device_serialize()Converts the object into a header and list of Buffer/memoryview objects for file storage or network transmission.
host_deserialize(header, frames)Convert serialized header and frames back into respective Object Type
host_serialize()Converts the object into a header and list of memoryview objects for file storage or network transmission.
nth(n)Return the nth row from each group.
nunique()Return the number of unique values per group.
rolling(*args, **kwargs)Returns a RollingGroupby object that enables rolling window calculations on the groups.
size()Return the size of each group.
-
agg(func)¶ Apply aggregation(s) to the groups.
- Parameters
- funcstr, callable, list or dict
- Returns
- A Series or DataFrame containing the combined results of the
- aggregation.
Examples
>>> import cudf >>> a = cudf.DataFrame({'a': [1, 1, 2], 'b': [1, 2, 3]}) >>> a.groupby('a').agg('sum') b a 1 3 2 3
Specifying a list of aggregations to perform on each column.
>>> a.groupby('a').agg(['sum', 'min']) b c sum min sum min a 1 3 1 4 2 2 3 3 1 1
Using a dict to specify aggregations to perform per column.
>>> a.groupby('a').agg({'a': 'max', 'b': ['min', 'mean']}) a b max min mean a 1 1 1 1.5 2 2 3 3.0
Using lambdas/callables to specify aggregations taking parameters.
>>> f1 = lambda x: x.quantile(0.5); f1.__name__ = "q0.5" >>> f2 = lambda x: x.quantile(0.75); f2.__name__ = "q0.75" >>> a.groupby('a').agg([f1, f2]) b c q0.5 q0.75 q0.5 q0.75 a 1 1.5 1.75 2.0 2.0 2 3.0 3.00 1.0 1.0
-
aggregate(func)¶ Apply aggregation(s) to the groups.
- Parameters
- funcstr, callable, list or dict
- Returns
- A Series or DataFrame containing the combined results of the
- aggregation.
Examples
>>> import cudf >>> a = cudf.DataFrame({'a': [1, 1, 2], 'b': [1, 2, 3]}) >>> a.groupby('a').agg('sum') b a 1 3 2 3
Specifying a list of aggregations to perform on each column.
>>> a.groupby('a').agg(['sum', 'min']) b c sum min sum min a 1 3 1 4 2 2 3 3 1 1
Using a dict to specify aggregations to perform per column.
>>> a.groupby('a').agg({'a': 'max', 'b': ['min', 'mean']}) a b max min mean a 1 1 1 1.5 2 2 3 3.0
Using lambdas/callables to specify aggregations taking parameters.
>>> f1 = lambda x: x.quantile(0.5); f1.__name__ = "q0.5" >>> f2 = lambda x: x.quantile(0.75); f2.__name__ = "q0.75" >>> a.groupby('a').agg([f1, f2]) b c q0.5 q0.75 q0.5 q0.75 a 1 1.5 1.75 2.0 2.0 2 3.0 3.00 1.0 1.0
-
apply(function)¶ Apply a python transformation function over the grouped chunk.
- Parameters
- funcfunction
The python transformation function that will be applied on the grouped chunk.
Examples
from cudf import DataFrame df = DataFrame() df['key'] = [0, 0, 1, 1, 2, 2, 2] df['val'] = [0, 1, 2, 3, 4, 5, 6] groups = df.groupby(['key']) # Define a function to apply to each row in a group def mult(df): df['out'] = df['key'] * df['val'] return df result = groups.apply(mult) print(result)
Output:
key val out 0 0 0 0 1 0 1 0 2 1 2 2 3 1 3 3 4 2 4 8 5 2 5 10 6 2 6 12
-
apply_grouped(function, **kwargs)¶ Apply a transformation function over the grouped chunk.
This uses numba’s CUDA JIT compiler to convert the Python transformation function into a CUDA kernel, thus will have a compilation overhead during the first run.
- Parameters
- funcfunction
The transformation function that will be executed on the CUDA GPU.
- incols: list
A list of names of input columns.
- outcols: list
A dictionary of output column names and their dtype.
- kwargsdict
name-value of extra arguments. These values are passed directly into the function.
Examples
from cudf import DataFrame from numba import cuda import numpy as np df = DataFrame() df['key'] = [0, 0, 1, 1, 2, 2, 2] df['val'] = [0, 1, 2, 3, 4, 5, 6] groups = df.groupby(['key']) # Define a function to apply to each group def mult_add(key, val, out1, out2): for i in range(cuda.threadIdx.x, len(key), cuda.blockDim.x): out1[i] = key[i] * val[i] out2[i] = key[i] + val[i] result = groups.apply_grouped(mult_add, incols=['key', 'val'], outcols={'out1': np.int32, 'out2': np.int32}, # threads per block tpb=8) print(result)
Output:
key val out1 out2 0 0 0 0 0 1 0 1 0 1 2 1 2 2 3 3 1 3 3 4 4 2 4 8 6 5 2 5 10 7 6 2 6 12 8
import cudf import numpy as np from numba import cuda import pandas as pd from random import randint # Create a random 15 row dataframe with one categorical # feature and one random integer valued feature df = cudf.DataFrame( { "cat": [1] * 5 + [2] * 5 + [3] * 5, "val": [randint(0, 100) for _ in range(15)], } ) # Group the dataframe by its categorical feature groups = df.groupby("cat") # Define a kernel which takes the moving average of a # sliding window def rolling_avg(val, avg): win_size = 3 for i in range(cuda.threadIdx.x, len(val), cuda.blockDim.x): if i < win_size - 1: # If there is not enough data to fill the window, # take the average to be NaN avg[i] = np.nan else: total = 0 for j in range(i - win_size + 1, i + 1): total += val[j] avg[i] = total / win_size # Compute moving averages on all groups results = groups.apply_grouped(rolling_avg, incols=['val'], outcols=dict(avg=np.float64)) print("Results:", results) # Note this gives the same result as its pandas equivalent pdf = df.to_pandas() pd_results = pdf.groupby('cat')['val'].rolling(3).mean()
Output:
Results: cat val avg 0 1 16 1 1 45 2 1 62 41.0 3 1 45 50.666666666666664 4 1 26 44.333333333333336 5 2 5 6 2 51 7 2 77 44.333333333333336 8 2 1 43.0 9 2 46 41.333333333333336 [5 more rows]
This is functionally equivalent to pandas.DataFrame.Rolling
-
nth(n)¶ Return the nth row from each group.
-
nunique()¶ Return the number of unique values per group.
-
rolling(*args, **kwargs)¶ Returns a RollingGroupby object that enables rolling window calculations on the groups.
See also
cudf.core.window.Rolling
-
size()¶ Return the size of each group.
General utility functions¶
-
cudf.testing.testing.assert_column_equal(left, right, check_dtype=True, check_column_type='equiv', check_less_precise=False, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_category_order=True, obj='ColumnBase')¶ Check that left and right columns are equal
This function is intended to compare two columns and output any differences. Additional parameters allow varying the strictness of the equality checks performed.
-
cudf.testing.testing.assert_frame_equal(left, right, check_dtype=True, check_index_type='equiv', check_column_type='equiv', check_frame_type=True, check_less_precise=False, by_blocks=False, check_names=True, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_like=False, obj='DataFrame')¶ Check that left and right DataFrame are equal
This function is intended to compare two DataFrame and output any differences. Additional parameters allow varying the strictness of the equality checks performed.
Examples
>>> import cudf >>> df1 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 2.0]}, index=[1, 2]) >>> df2 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 2.0]}, index=[2, 3]) >>> cudf.testing.assert_frame_equal(df1, df2) ...... ...... AssertionError: ColumnBase are different
values are different (100.0 %) [left]: [1 2] [right]: [2 3]
>>> df2 = cudf.DataFrame({"a":[1, 2], "c":[1.0, 2.0]}, index=[1, 2]) >>> cudf.testing.assert_frame_equal(df1, df2) ...... ...... AssertionError: DataFrame.columns are different
DataFrame.columns values are different (50.0 %) [left]: Index([‘a’, ‘b’], dtype=’object’) right]: Index([‘a’, ‘c’], dtype=’object’)
>>> df2 = cudf.DataFrame({"a":[1, 2], "b":[1.0, 3.0]}, index=[1, 2]) >>> cudf.testing.assert_frame_equal(df1, df2) ...... ...... AssertionError: Column name="b" are different
values are different (50.0 %) [left]: [1. 2.] [right]: [1. 3.]
# This will pass without any hitch >>> df2 = cudf.DataFrame({“a”:[1, 2], “b”:[1.0, 2.0]}, index=[1, 2]) >>> cudf.testing.assert_frame_equal(df1, df2)
-
cudf.testing.testing.assert_index_equal(left, right, exact='equiv', check_names: bool = True, check_less_precise: Union[bool, int] = False, check_exact: bool = True, check_categorical: bool = True, obj: str = 'Index')¶ Check that left and right Index are equal
This function is intended to compare two Index and output any differences. Additional parameters allow varying the strictness of the equality checks performed.
Examples
>>> import cudf >>> id1 = cudf.Index([1, 2, 3, 4]) >>> id2 = cudf.Index([1, 2, 3, 5]) >>> cudf.testing.assert_index_equal(id1, id2) ...... ...... AssertionError: ColumnBase are different
values are different (25.0 %) [left]: [1 2 3 4] [right]: [1 2 3 5]
>>> id2 = cudf.Index([1, 2, 3, 4], name="b") >>> cudf.testing.assert_index_equal(id1, id2) ...... ...... AssertionError: Index are different
name mismatch [left]: a [right]: b
# This will pass without any hitch >>> id2 = cudf.Index([1, 2, 3, 4], name=”a”) >>> cudf.testing.assert_index_equal(id1, id2)
-
cudf.testing.testing.assert_series_equal(left, right, check_dtype=True, check_index_type='equiv', check_series_type=True, check_less_precise=False, check_names=True, check_exact=False, check_datetimelike_compat=False, check_categorical=True, check_category_order=True, obj='Series')¶ Check that left and right Series are equal
This function is intended to compare two Series and output any differences. Additional parameters allow varying the strictness of the equality checks performed.
Examples
>>> import cudf >>> sr1 = cudf.Series([1, 2, 3, 4], name="a") >>> sr2 = cudf.Series([1, 2, 3, 5], name="b") >>> cudf.testing.assert_series_equal(sr1, sr2) ...... ...... AssertionError: ColumnBase are different
values are different (25.0 %) [left]: [1 2 3 4] [right]: [1 2 3 5]
>>> sr2 = cudf.Series([1, 2, 3, 4], name="b") >>> cudf.testing.assert_series_equal(sr1, sr2) ...... ...... AssertionError: Series are different
name mismatch [left]: a [right]: b
# This will pass without any hitch >>> sr2 = cudf.Series([1, 2, 3, 4], name=”a”) >>> cudf.testing.assert_series_equal(sr1, sr2)
IO¶
-
cudf.io.csv.read_csv(filepath_or_buffer, lineterminator='\n', quotechar='"', quoting=0, doublequote=True, header='infer', mangle_dupe_cols=True, usecols=None, sep=',', delimiter=None, delim_whitespace=False, skipinitialspace=False, names=None, dtype=None, skipfooter=0, skiprows=0, dayfirst=False, compression='infer', thousands=None, decimal='.', true_values=None, false_values=None, nrows=None, byte_range=None, skip_blank_lines=True, parse_dates=None, comment=None, na_values=None, keep_default_na=True, na_filter=True, prefix=None, index_col=None, **kwargs)¶ Load a comma-seperated-values (CSV) dataset into a DataFrame
- Parameters
- filepath_or_bufferstr, path object, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).
- sepchar, default ‘,’
Delimiter to be used.
- delimiterchar, default None
Alternative argument name for sep.
- delim_whitespacebool, default False
Determines whether to use whitespace as delimiter.
- lineterminatorchar, default ‘n’
Character to indicate end of line.
- skipinitialspacebool, default False
Skip spaces after delimiter.
- nameslist of str, default None
List of column names to be used.
- dtypetype, list of types, or dict of column -> type, default None
Data type(s) for data or columns. If list, types are applied in the same order as the column names. If dict, types are mapped to the column names. E.g. {‘a’: np.float64, ‘b’: int32, ‘c’: ‘float’} If None, dtypes are inferred from the dataset. Use str to preserve data and not infer or interpret to dtype.
- quotecharchar, default ‘”’
Character to indicate start and end of quote item.
- quotingstr or int, default 0
Controls quoting behavior. Set to one of 0 (csv.QUOTE_MINIMAL), 1 (csv.QUOTE_ALL), 2 (csv.QUOTE_NONNUMERIC) or 3 (csv.QUOTE_NONE). Quoting is enabled with all values except 3.
- doublequotebool, default True
When quoting is enabled, indicates whether to interpret two consecutive quotechar inside fields as single quotechar
- headerint, default ‘infer’
Row number to use as the column names. Default behavior is to infer the column names: if no names are passed, header=0; if column names are passed explicitly, header=None.
- usecolslist of int or str, default None
Returns subset of the columns given in the list. All elements must be either integer indices (column number) or strings that correspond to column names
- mangle_dupe_colsboolean, default True
Duplicate columns will be specified as ‘X’,’X.1’,…’X.N’.
- skiprowsint, default 0
Number of rows to be skipped from the start of file.
- skipfooterint, default 0
Number of rows to be skipped at the bottom of file.
- compression{‘infer’, ‘gzip’, ‘zip’, None}, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’, then detect compression from the following extensions: ‘.gz’,‘.zip’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in, otherwise the first non-zero-sized file will be used. Set to None for no decompression.
- decimalchar, default ‘.’
Character used as a decimal point.
- thousandschar, default None
Character used as a thousands delimiter.
- true_valueslist, default None
Values to consider as boolean True
- false_valueslist, default None
Values to consider as boolean False
- nrowsint, default None
If specified, maximum number of rows to read
- byte_rangelist or tuple, default None
Byte range within the input file to be read. The first number is the offset in bytes, the second number is the range size in bytes. Set the size to zero to read all data after the offset location. Reads the row that starts before or at the end of the range, even if it ends after the end of the range.
- skip_blank_linesbool, default True
If True, discard and do not parse empty lines If False, interpret empty lines as NaN values
- parse_dateslist of int or names, default None
If list of columns, then attempt to parse each entry as a date. Columns may not always be recognized as dates, for instance due to unusual or non-standard formats. To guarantee a date and increase parsing speed, explicitly specify dtype=’date’ for the desired columns.
- commentchar, default None
Character used as a comments indicator. If found at the beginning of a line, the line will be ignored altogether.
- na_valueslist, default None
Values to consider as invalid
- keep_default_nabool, default True
Whether or not to include the default NA values when parsing the data.
- na_filterbool, default True
Detect missing values (empty strings and the values in na_values). Passing False can improve performance.
- prefixstr, default None
Prefix to add to column numbers when parsing without a header row
- index_colint, string or False, default None
Column to use as the row labels of the DataFrame. Passing index_col=False explicitly disables index column inference and discards the last column.
- Returns
- GPU
DataFrameobject.
- GPU
See also
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
Create a test csv file
>>> import cudf >>> filename = 'foo.csv' >>> lines = [ ... "num1,datetime,text", ... "123,2018-11-13T12:00:00,abc", ... "456,2018-11-14T12:35:01,def", ... "789,2018-11-15T18:02:59,ghi" ... ] >>> with open(filename, 'w') as fp: ... fp.write('\n'.join(lines)+'\n')
Read the file with
cudf.read_csv>>> cudf.read_csv(filename) num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.csv.to_csv(df, path=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator='\n', chunksize=None, **kwargs)¶ Write a dataframe to csv file format.
- Parameters
- dfDataFrame
DataFrame object to be written to csv
- pathstr, default None
Path of file where DataFrame will be written
- sepchar, default ‘,’
Delimiter to be used.
- na_repstr, default ‘’
String to use for null entries
- columnslist of str, optional
Columns to write
- headerbool, default True
Write out the column names
- indexbool, default True
Write out the index as a column
- line_terminatorchar, default ‘n’
- chunksizeint or None, default None
Rows to write at a time
See also
Notes
Follows the standard of Pandas csv.QUOTE_NONNUMERIC for all output.
If to_csv leads to memory errors consider setting the chunksize argument.
Examples
Write a dataframe to csv.
>>> import cudf >>> filename = 'foo.csv' >>> df = cudf.DataFrame({'x': [0, 1, 2, 3], 'y': [1.0, 3.3, 2.2, 4.4], 'z': ['a', 'b', 'c', 'd']}) >>> df = df.set_index([3, 2, 1, 0]) >>> df.to_csv(filename)
-
cudf.io.parquet.merge_parquet_filemetadata(filemetadata_list)¶ Merge multiple parquet metadata blobs
- Parameters
- metadata_listlist
List of buffers returned by to_parquet
- Returns
- Combined parquet metadata blob
See also
-
cudf.io.parquet.read_parquet(filepath_or_buffer, engine='cudf', columns=None, row_groups=None, skip_rows=None, num_rows=None, strings_to_categorical=False, use_pandas_metadata=True, *args, **kwargs)¶ Load a Parquet dataset into a DataFrame
- Parameters
- filepath_or_bufferstr, path object, bytes, file-like object, or a list
of such objects. Contains one or more of the following: either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
- engine{ ‘cudf’, ‘pyarrow’ }, default ‘cudf’
Parser engine to use.
- columnslist, default None
If not None, only these columns will be read.
- row_groupsint, or list, or a list of lists default None
If not None, specifies, for each input file, which row groups to read. If reading multiple inputs, a list of lists should be passed, one list for each input.
- skip_rowsint, default None
If not None, the number of rows to skip from the start of the file.
- num_rowsint, default None
If not None, the total number of rows to read.
- strings_to_categoricalboolean, default False
If True, return string columns as GDF_CATEGORY dtype; if False, return a as GDF_STRING dtype.
- use_pandas_metadataboolean, default True
If True and dataset has custom PANDAS schema metadata, ensure that index columns are also loaded.
- Returns
- DataFrame
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
>>> import cudf >>> df = cudf.read_parquet(filename) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.parquet.read_parquet_metadata(path)¶ Read a Parquet file’s metadata and schema
- Parameters
- pathstring or path object
Path of file to be read
- Returns
- Total number of rows
- Number of row groups
- List of column names
See also
Examples
>>> import cudf >>> num_rows, num_row_groups, names = cudf.io.read_parquet_metadata(filename) >>> df = [cudf.read_parquet(fname, row_group=i) for i in range(row_groups)] >>> df = cudf.concat(df) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.parquet.to_parquet(df, path, engine='cudf', compression='snappy', index=None, partition_cols=None, statistics='ROWGROUP', metadata_file_path=None, *args, **kwargs)¶ Write a DataFrame to the parquet format.
- Parameters
- pathstr
File path or Root Directory path. Will be used as Root Directory path while writing a partitioned dataset.
- compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
Name of the compression to use. Use
Nonefor no compression.- indexbool, default None
If
True, include the dataframe’s index(es) in the file output. IfFalse, they will not be written to the file. IfNone, the engine’s default behavior will be used.- partition_colslist, optional, default None
Column names by which to partition the dataset Columns are partitioned in the order they are given
-
cudf.io.parquet.write_to_dataset(df, root_path, partition_cols=None, fs=None, preserve_index=False, return_metadata=False, **kwargs)¶ Wraps to_parquet to write partitioned Parquet datasets. For each combination of partition group and value, subdirectories are created as follows:
root_dir/ group=value1 <uuid>.parquet ... group=valueN <uuid>.parquet- Parameters
- dfcudf.DataFrame
- root_pathstring,
The root directory of the dataset
- fsFileSystem, default None
If nothing passed, paths assumed to be found in the local on-disk filesystem
- preserve_indexbool, default False
Preserve index values in each parquet file.
- partition_colslist,
Column names by which to partition the dataset Columns are partitioned in the order they are given
- return_metadatabool, default False
Return parquet metadata for written data. Returned metadata will include the file-path metadata (relative to root_path).
- **kwargsdict,
kwargs for to_parquet function.
-
cudf.io.orc.read_orc(filepath_or_buffer, engine='cudf', columns=None, stripe=None, stripe_count=None, skip_rows=None, num_rows=None, use_index=True, decimals_as_float=True, force_decimal_scale=None, timestamp_type=None, **kwargs)¶ Load an ORC dataset into a DataFrame
- Parameters
- filepath_or_bufferstr, path object, bytes, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
- engine{ ‘cudf’, ‘pyarrow’ }, default ‘cudf’
Parser engine to use.
- columnslist, default None
If not None, only these columns will be read from the file.
- stripe: int, default None
If not None, only the stripe with the specified index will be read.
- skip_rowsint, default None
If not None, the number of rows to skip from the start of the file.
- num_rowsint, default None
If not None, the total number of rows to read.
- use_indexbool, default True
If True, use row index if available for faster seeking.
- kwargs are passed to the engine
- Returns
- DataFrame
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
>>> import cudf >>> df = cudf.read_orc(filename) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.orc.read_orc_metadata(path)¶ Read an ORC file’s metadata and schema
- Parameters
- pathstring or path object
Path of file to be read
- Returns
- Total number of rows
- Number of stripes
- List of column names
See also
Examples
>>> import cudf >>> num_rows, stripes, names = cudf.io.read_orc_metadata(filename) >>> df = [cudf.read_orc(fname, stripe=i) for i in range(stripes)] >>> df = cudf.concat(df) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.orc.to_orc(df, fname, compression=None, enable_statistics=True, **kwargs)¶ Write a DataFrame to the ORC format.
- Parameters
- fnamestr
File path or object where the ORC dataset will be stored.
- compression{{ ‘snappy’, None }}, default None
Name of the compression to use. Use None for no compression.
- enable_statistics: boolean, default True
Enable writing column statistics.
See also
-
cudf.io.json.read_json(path_or_buf, engine='auto', dtype=True, lines=False, compression='infer', byte_range=None, *args, **kwargs)¶ Load a JSON dataset into a DataFrame
- Parameters
- path_or_bufstr, path object, or file-like object
Either JSON data in a str, path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).
- engine{{ ‘auto’, ‘cudf’, ‘pandas’ }}, default ‘auto’
Parser engine to use. If ‘auto’ is passed, the engine will be automatically selected based on the other parameters.
- orientstring,
Indication of expected JSON string format (pandas engine only). Compatible JSON strings can be produced by
to_json()with a corresponding orient value. The set of possible orients is:'split': dict like{index -> [index], columns -> [columns], data -> [values]}'records': list like[{column -> value}, ... , {column -> value}]'index': dict like{index -> {column -> value}}'columns': dict like{column -> {index -> value}}'values': just the values array
The allowed and default values depend on the value of the typ parameter.
when
typ == 'series',allowed orients are
{'split','records','index'}default is
'index'The Series index must be unique for orient
'index'.
when
typ == 'frame',allowed orients are
{'split','records','index', 'columns','values', 'table'}default is
'columns'The DataFrame index must be unique for orients
'index'and'columns'.The DataFrame columns must be unique for orients
'index','columns', and'records'.
- typtype of object to recover (series or frame), default ‘frame’
With cudf engine, only frame output is supported.
- dtypeboolean or dict, default True
If True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, applies only to the data.
- convert_axesboolean, default True
Try to convert the axes to the proper dtypes (pandas engine only).
- convert_datesboolean, default True
List of columns to parse for dates (pandas engine only); If True, then try to parse datelike columns default is True; a column label is datelike if
it ends with
'_at',it ends with
'_time',it begins with
'timestamp',it is
'modified', orit is
'date'
- keep_default_datesboolean, default True
If parsing dates, parse the default datelike columns (pandas engine only)
- numpyboolean, default False
Direct decoding to numpy arrays (pandas engine only). Supports numeric data only, but non-numeric column and index labels are supported. Note also that the JSON ordering MUST be the same for each term if numpy=True.
- precise_floatboolean, default False
Set to enable usage of higher precision (strtod) function when decoding string to double values (pandas engine only). Default (False) is to use fast but less precise builtin functionality
- date_unitstring, default None
The timestamp unit to detect if converting dates (pandas engine only). The default behavior is to try and detect the correct precision, but if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds, milliseconds, microseconds or nanoseconds.
- encodingstr, default is ‘utf-8’
The encoding to use to decode py3 bytes. With cudf engine, only utf-8 is supported.
- linesboolean, default False
Read the file as a json object per line.
- chunksizeinteger, default None
Return JsonReader object for iteration (pandas engine only). See the line-delimited json docs for more information on
chunksize. This can only be passed if lines=True. If this is None, the file will be read into memory all at once.- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip or xz if path_or_buf is a string ending in ‘.gz’, ‘.bz2’, ‘.zip’, or ‘xz’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
- byte_rangelist or tuple, default None
Byte range within the input file to be read (cudf engine only). The first number is the offset in bytes, the second number is the range size in bytes. Set the size to zero to read all data after the offset location. Reads the row that starts before or at the end of the range, even if it ends after the end of the range.
- Returns
- resultSeries or DataFrame, depending on the value of typ.
See also
-
cudf.io.json.to_json(cudf_val, path_or_buf=None, *args, **kwargs)¶ Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.
- Parameters
- path_or_bufstring or file handle, optional
File path or object. If not specified, the result is returned as a string.
- orientstring
Indication of expected JSON string format.
- Series
default is ‘index’
allowed values are: {‘split’,’records’,’index’,’table’}
- DataFrame
default is ‘columns’
allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}
- The format of the JSON string
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
‘columns’ : dict like {column -> {index -> value}}
‘values’ : just the values array
‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like
orient='records'.
- date_format{None, ‘epoch’, ‘iso’}
Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For
orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.- double_precisionint, default 10
The number of decimal places to use when encoding floating point values.
- force_asciibool, default True
Force encoded string to be ASCII.
- date_unitstring, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
- default_handlercallable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.
- linesbool, default False
If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.
- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.
- indexbool, default True
Whether to include the index values in the JSON string. Not including the index (
index=False) is only supported when orient is ‘split’ or ‘table’.
See also
-
cudf.io.avro.read_avro(filepath_or_buffer, engine='cudf', columns=None, skip_rows=None, num_rows=None, **kwargs)¶ Load an Avro dataset into a DataFrame
- Parameters
- filepath_or_bufferstr, path object, bytes, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
- engine{ ‘cudf’, ‘fastavro’ }, default ‘cudf’
Parser engine to use.
- columnslist, default None
If not None, only these columns will be read.
- skip_rowsint, default None
If not None, the number of rows to skip from the start of the file.
- num_rowsint, default None
If not None, the total number of rows to read.
- Returns
- DataFrame
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
>>> import cudf >>> df = cudf.read_avro(filename) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.dlpack.from_dlpack(pycapsule_obj)¶ Converts from a DLPack tensor to a cuDF object.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a PyCapsule object which contains a pointer to a DLPack tensor as input, and returns a cuDF object. This function deep copies the data in the DLPack tensor into a cuDF object.
- Parameters
- pycapsule_objPyCapsule
Input DLPack tensor pointer which is encapsulated in a PyCapsule object.
- Returns
- A cuDF DataFrame or Series depending on if the input DLPack tensor is 1D
- or 2D.
-
cudf.io.dlpack.to_dlpack(cudf_obj)¶ Converts a cuDF object to a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object as input, and returns a PyCapsule object which contains a pointer to DLPack tensor. This function deep copies the data in the cuDF object into the DLPack tensor.
- Parameters
- cudf_objcuDF Object
Input cuDF object.
- Returns
- A DLPack tensor pointer which is encapsulated in a PyCapsule object.
-
cudf.io.feather.read_feather(path, *args, **kwargs)¶ Load an feather object from the file path, returning a DataFrame.
- Parameters
- pathstring
File path
- columnslist, default=None
If not None, only these columns will be read from the file.
- Returns
- DataFrame
See also
Examples
>>> import cudf >>> df = cudf.read_feather(filename) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
-
cudf.io.feather.to_feather(df, path, *args, **kwargs)¶ Write a DataFrame to the feather format.
- Parameters
- pathstr
File path
See also
-
cudf.io.hdf.read_hdf(path_or_buf, *args, **kwargs)¶ Read from the store, close it if we opened it.
Retrieve pandas object stored in file, optionally based on where criteria
- Parameters
- path_or_bufstring, buffer or path object
Path to the file to open, or an open HDFStore. object. Supports any object implementing the
__fspath__protocol. This includespathlib.Pathand py._path.local.LocalPath objects.- keyobject, optional
The group identifier in the store. Can be omitted if the HDF file contains a single pandas object.
- mode{‘r’, ‘r+’, ‘a’}, optional
Mode to use when opening the file. Ignored if path_or_buf is a Pandas HDFS. Default is ‘r’.
- wherelist, optional
A list of Term (or convertible) objects.
- startint, optional
Row number to start selection.
- stopint, optional
Row number to stop selection.
- columnslist, optional
A list of columns names to return.
- iteratorbool, optional
Return an iterator object.
- chunksizeint, optional
Number of rows to include in an iteration when using an iterator.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()for a full list of options.- **kwargs
Additional keyword arguments passed to HDFStore.
- Returns
- itemobject
The selected object. Return type depends on the object stored.
See also
cudf.io.hdf.to_hdfWrite a HDF file from a DataFrame.
-
cudf.io.hdf.to_hdf(path_or_buf, key, value, *args, **kwargs)¶ Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.
For more information see the user guide.
- Parameters
- path_or_bufstr or pandas.HDFStore
File path or HDFStore object.
- keystr
Identifier for the group in the store.
- mode{‘a’, ‘w’, ‘r+’}, default ‘a’
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- format{‘fixed’, ‘table’}, default ‘fixed’
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
- appendbool, default False
For Table formats, append the input data to the existing.
- data_columnslist of columns or True, optional
List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.
- complevel{0-9}, optional
Specifies a compression level for data. A value of 0 disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- fletcher32bool, default False
If applying compression use the fletcher32 checksum.
- dropnabool, default False
If true, ALL nan rows will not be written to store.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()for a full list of options.
See also
cudf.io.hdf.read_hdfRead from HDF file.
cudf.io.parquet.to_parquetWrite a DataFrame to the binary parquet format.
cudf.io.feather.to_featherWrite out feather-format for DataFrames.